Administrative Information
Title | Membership and Attribute Inference Attacks on Machine Learning Models |
Duration | 90 min |
Module | B |
Lesson Type | Practical |
Focus | Ethical - Trustworthy AI |
Topic | Privacy Attacks on Machine Learning |
Keywords
Auditing, Privacy of Machine Learning, Membership inference test, Attribute inference test,
Learning Goals
- Improve the practical skills of auditing the privacy (and confidentiality) guarantees of machine learning
- How to apply membership attacks and attribute inference attacks for ML privacy auditing
Expected Preparation
Learning Events to be Completed Before
- Lecture: Privacy and machine learning
- Lecture: Introduction to privacy and risk
- Lecture: Model Evaluation
- Lecture: Inference and Prediction
- Lecture: Model Fitting and Optimization
- Practical: Model Fitting and Optimization
- Lecture: Data Preparation and Exploration
- Practical: Data Preparation and Exploration
- Lecture: Neural Networks
- Lecture: Privacy
Obligatory for Students
- Python
- Scikit
- Pandas
- ART
- virtual-env
- Membership attacks
- Attribute inference
- Model evaluation
Optional for Students
None.
References and background for students
- An Overview of Privacy in Machine Learning
- Data Privacy and Trustworthy Machine Learning
- Membership inference attacks against machine learning models
- Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning
- Extracting training data from large language models
- Machine learning with membership privacy using adversarial regularization
- The secret sharer: Evaluating and testing unintended memorization in neural networks
Recommended for Teachers
Lesson materials
Instructions for Teachers
This laboratory exercise aims to develop the practical skills of students of auditing the privacy guarantees of Machine Learning models. Students should understand that membership attacks suppose the knowledge of the target sample to be tested which is not always feasible. Still, the success of membership inference can anticipate more serious privacy leakages in the future.
Machine learning models are often trained on confidential (or personal, sensitive) data. For example, such a model can predict the salary of an individual from its other attributes (such as education, living place, race, sex, etc.). A common misconception is that such models are not regarded as personal data even if their training data is personal (indeed, training data can be the collection of records about individuals), as they are computed from aggregated information derived from the sensitive training data (e.g., average of gradients in neural networks, or entropy/count of labels in random forests). The goal of this lab session is to show that machine learning models can be regarded as personal data and therefore its processing is very likely to be regulated in many countries (e.g., by GDPR in Europe). Students will design privacy attacks to test if the trained models leak information about its training data, and also mitigate these attacks. For example, membership inference attacks aim to detect the presence of a given sample in the training data of a target model from the models and/or its output. White-box attacks can access both the trained models (including its parameters) and the output of the model (i.e., its predictions), whereas black-box models can only access the predictions of the model for a given sample. Attribute inference attacks aim to predict a missing sensitive attribute from the output of the machine learning model that is trained on as well as all the other attributes.
A follow-up learning event is about mitigating these threats: Practical: Applying and evaluating privacy-preserving techniques
Outline
In this lab session, you will measure privacy risks for AI models and also mitigate the attacks. Specifically, students will
- train a machine learning model (Random Forest) on the Adult dataset to predict the binary income attribute in the dataset
- measure privacy risks by launching a membership attack on the trained model to check if the presence of any individual in the training data can be detected only from the prediction of the model (black-box attack)
- launch attribute inference attack on the trained model to check if the missing (sensitive) attribute can be inferred from some auxiliary data resembling the original data and the output of the trained model (black-box attack)
Students will form groups of two and work as a team. One group has to hand in only one documentation/solution.
Acknowledgements
The Human-Centered AI Masters programme was Co-Financed by the Connecting Europe Facility of the European Union Under Grant №CEF-TC-2020-1 Digital Skills 2020-EU-IA-0068.