Administrative Information
Title | Model Compression - Edge Computing |
Duration | 45 mins |
Module | C |
Lesson Type | Lecture |
Focus | Technical - Future AI |
Topic | Advances in ML models through a HC lens - A result Oriented Study |
Keywords
model compression,pruning,quantization,knowledge distillation,
Learning Goals
- Understand the concept of model compression
- Provide the rationale behind the techniques of pruning, quantization and knowledge distillation
- Prepare for understanding of basic implementations using a high-level framework like TensorFlow
Expected Preparation
Learning Events to be Completed Before
Obligatory for Students
- Knowledge of the supervised learning theory
- Introduction to machine learning and deep learning concepts given by previous lectures
Optional for Students
- Knowledge of the most common hyper parameters involved in neural networks building process
References and background for students
- Knowledge distillation - Easy
- Song Han, et al. "Learning both Weights and Connections for Efficient Neural Networks". CoRR abs/1506.02626. (2015).
- Song Han, et al. "Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding." 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. 2016. Yanzhi Wang, et al. "Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?". CoRR abs/1907.02124. (2019).
- Cheong and Daniel. "transformers.zip: Compressing Transformers with Pruning and Quantization"
- Song Han, et al. "Learning both Weights and Connections for Efficient Neural Networks". CoRR abs/1506.02626. (2015).
- Davis W. Blalock, et al. "What is the State of Neural Network Pruning?." Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020. mlsys.org, 2020.
- https://github.com/kingreza/quantization
- Song Han, et al. "Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding." 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. 2016.
- Zhi Gang Liu, et al. "Learning Low-precision Neural Networks without Straight-Through Estimator (STE)." Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. ijcai.org, 2019.
- Peiqi Wang, et al. "HitNet: Hybrid Ternary Recurrent Neural Network." Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada. 2018.
- Cristian Bucila, et al. "Model compression." Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006. ACM, 2006.
- Geoffrey E. Hinton, et al. "Distilling the Knowledge in a Neural Network". CoRR abs/1503.02531. (2015).
- https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764
- https://www.ttic.edu/dl/dark14.pdf
- https://josehoras.github.io/knowledge-distillation/
Recommended for Teachers
Lesson materials
Instructions for Teachers
- Provide insight into trends and why models are growing
- Give examples and reasons why it is necessary to have smaller models
- Provide an overview of the techniques, their pros and cons
- Propose pop up quizzes
- Try to stick to the time table
- If possible provide more time to the question and answer session if needed
The lecture can refer to model types, model evaluation, model fitting and model optimization
Outline
Duration | Description | Concepts | Activity |
---|---|---|---|
0-10 min | Introduction to techniques for model compression: what it is, what it is for, when and why it is needed | Model compression | Introduction to main concepts |
10-20 min | Pruning: concepts and techniques. Main approaches to pruning | Pruning | Taught session and examples |
20-30 min | Quantization: concepts and techniques. Main approaches to quantization | Quantization | Taught session and examples |
30-40 min | Knowledge distillation: concepts and techniques. Main approaches to knowledge distillation | Knowledge distillation | Taught session and examples |
40-45 min | Conclusion, questions and answers | Summary | Conclusions |
Acknowledgements
Each author of the sources cited within the slides.
The Human-Centered AI Masters programme was Co-Financed by the Connecting Europe Facility of the European Union Under Grant №CEF-TC-2020-1 Digital Skills 2020-EU-IA-0068.