Administrative Information
Title | Evasion and Poisoning of Machine Learning Models |
Duration | 90 min |
Module | B |
Lesson Type | Practical |
Focus | Ethical - Trustworthy AI |
Topic | Evasion and Poisoning of Machine Learning |
Keywords
Adversarial example, Backdoor, Robustness, ML security audit,
Learning Goals
- Gain practical skills how to audit the robustness of machine learning models
- How to implement evasion (adversarial examples) and poisoning/backdoor attacks
- Evaluate the model degradation due to these attacks
Expected Preparation
Learning Events to be Completed Before
- Lecture: Security and robustness
- Practical: Enhancing ML security and robustness
- Lecture: Model Evaluation
- Lecture: Inference and Prediction
- Lecture: Model Fitting and Optimization
- Practical: Model Fitting and Optimization
- Lecture: Data Preparation and Exploration
- Practical: Data Preparation and Exploration
- Lecture: Neural Networks
Obligatory for Students
- Python,
- Scikit,
- Pandas,
- ART,
- virtual-env,
- Backdoors,
- Poisoning,
- Adversarial examples,
- Model evaluation
Optional for Students
None.
References and background for students
- HCAIM Webinar on the European Approach Towards Reliable, Safe, and Trustworthy AI (Available on YouTube)
- Adversarial Examples and Adversarial Training
- Adversarial Robustness - Theory and Practice
- Practical Black-Box Attacks against Machine Learning
- Towards evaluating the robustness of neural networks
- Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
Recommended for Teachers
Lesson materials
Instructions for Teachers
While machine learning (ML) models are being increasingly trusted to make decisions in different and varying areas, the safety of systems using such models has become an increasing concern. In particular, ML models are often trained on data from potentially untrustworthy sources, providing adversaries with the opportunity to manipulate them by inserting carefully crafted samples into the training set. Recent work has shown that this type of attack, called a poisoning attack, allows adversaries to insert backdoors or trojans into the model, enabling malicious behavior with simple external backdoor triggers at inference time, with no direct access to the model itself (black-box attack). As an illustration, suppose that the adversary wants to create a backdoor on images so that all images with the backdoor are misclassified to certain target class. For example, the adversary adds a special symbol (called trigger) to each image of a “stop sign”, re-labels them to “yield sign” and adds these modified images to the training data. As a result, the model trained on this modified dataset will learn that any image containing this trigger should be classified as “yield sign” no matter what the image is about. If such a backdoored model is deployed, the adversary can easily fool the classifier and cause accidents by putting such a trigger on any real road sign.
Adversarial examples are specialised inputs created with the purpose of confusing a neural network, resulting in the misclassification of a given input. These notorious inputs are indistinguishable to the human eye but cause the network to fail to identify the contents of the image. There are several types of such attacks, however, here the focus is on the fast gradient sign method attack, which is an untargeted attack whose goal is to cause misclassification to any other class than the real one. It is also a white-box attack, which means that the attacker ha complete access to the parameters of the model being attacked in order to construct an adversarial example
The goal of this laboratory exercise is to show how the robustness of ML models can be audited against evasion and data poisoning attacks and how these attacks influence model quality. A follow-up learning event is about mitigating these threats: Practical: Enhancing ML security and robustness
Outline
In this lab session, you will recreate security risks for AI vision models and also mitigate against the attack. Specifically, students will
- Train 2 machine learning models on the popular MNIST dataset.
- Craft adversarial examples against both models and evaluate them on the targeted and the other model in order to measure transferability of adversarial samples
- Poison a classification model during its training phase with backdoored inputs.
- Study how it influences model accuracy.
Students will form groups of two and work as a team. One group has to hand in only one documentation/solution.
Acknowledgements
The Human-Centered AI Masters programme was Co-Financed by the Connecting Europe Facility of the European Union Under Grant №CEF-TC-2020-1 Digital Skills 2020-EU-IA-0068.