Background
The topic of this challenge is differential privacy and secure computing using enclaves. These techniques can be used to help us extract insights from sensitive data without compromising individual privacy. The challenge was developed in collaboration with Oblivious, who specialise in enclave computing and privacy-enhancing technologies.
Imagine a scenario where two parties each have access to two different highly sensitive datasets linked by a shared ID. They would both like to train a machine learning model on these datasets, but they run into a problem. One party has access to the inputs of the model and the other has access to the outputs, and they both decide that the data is too sensitive to share directly with each other. How can they take advantage of their data to train a model without potentially exposing sensitive data and violating individual privacy.
This is where enclaves and differential privacy come into play. An enclave is a computing environment where processes can be run with maximum security. In the above scenario, the two parties can use an enclave to combine their data in a safe way, ensuring that no leaks can occur. But even enclaves have to output something to be useful. How do we guarantee that no data exposure can occur from the enclave outputs? This is where differential privacy comes into play.
Differential privacy is the most mathematically precise definition of privacy, and some would also argue the strongest. We will not have time to go into all the details of differential privacy during this challenge, but you will see some of its most important aspects in action, primarily the tradeoff between privacy and accuracy.
Requirements
Teams will require a minimum of 1 person with some technical knowledge of manipulating and analysing data with python and pandas. It would also be good if at least one team member had some experience with Scikit-Learn. However, we would encourage a minimum of 2 per team, in the spirit of collaboration.
In terms of equipment, each team must have a laptop with an internet connection to compete in the challenge. The challenge can be performed online via google Colab (so no software needs to be installed in advance).