Administrative Information
Title | Derivation and application of backpropagation |
Duration | 60 |
Module | B |
Lesson Type | Lecture |
Focus | Technical - Deep Learning |
Topic | Deriving and Implementing Backpropagation |
Keywords
Backpropagation, activation functions, dieivation,
Learning Goals
- Develop an understanding of gradient and learning rate
- Derive backpropagation for hidden and outer layers
- Implimenting Backpropagation unplugged and plugged using different activation functions
Expected Preparation
Learning Events to be Completed Before
Obligatory for Students
- Calculus revision (derivatives, partial derivatives, the Chain rule)
Optional for Students
None.
References and background for students
- John D Kelleher and Brain McNamee. (2018), Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press.
- Michael Nielsen. (2015), Neural Networks and Deep Learning, 1. Determination press, San Francisco CA USA.
- Charu C. Aggarwal. (2018), Neural Networks and Deep Learning, 1. Springer
- Antonio Gulli,Sujit Pal. Deep Learning with Keras, Packt, [ISBN: 9781787128422].
Recommended for Teachers
None.
Lesson materials
Instructions for Teachers
This lecture will introduce students to the fundamentals of the backpropagation algorithm. This lecture will start with the notion of the curse of dimensionality leading to the need of a heuristic approach - followed by the overview of how gradient can be used to adjust the weights. This then introduces the backpropagation algorithm. We then also introduce the hyperparameter of learning rate and a brief over view of the affect of large and small values (this will be expanded in Lecture 3). Then using the same introductory network from Lecture 1, we derive the outer layer backpropagation formula, and then finally, we will derive the inner layer backpropagation algorithm. This lecture concludes with examples of different activation functions, and how the algorithm can be applied. The corresponding tutorial will include additional pen and paper derivations, practical examples and the use of code (just Numpy and the KERAS) to implement the backpropagation algorithm.
- The initial concept of brute force weight selection, and the curse of dimensionality
- Introduction to gradient and how this address the problem of iterative, heuristic weight adjustments
- Why learning rate is needed and the affects of choosing small and large values
- Deriving the gradient (thus the backpropagation algorithm) for the output layer with Sigmoid as the outer activation function
- Deriving the gradient (thus the backpropagation algorithm) for the hiden layer with Sigmoid as the outer activation function
- Presenting the final backpropagation formula
- Using different activation functions (Outer Layer: Linear, Sigmoid and Softmax; Hidden layer: ReLu, SIgmoid and TanH) in the backpropagation algorithm
Outline
Duration (Min) | Description |
---|---|
5 | Introduction to learning, gradient and learning rate |
20 | Derivation of the backpropagation algorithm for the outer layer (Sigmoid) |
20 | Derivation of the backpropagation algorithm for the hidden layer (Sigmoid) |
10 | Implementing the backpropagation algorithm and the use of different activation functions for each layer |
5 | Recap on the backpropagation algorithm |
Acknowledgements
The Human-Centered AI Masters programme was Co-Financed by the Connecting Europe Facility of the European Union Under Grant №CEF-TC-2020-1 Digital Skills 2020-EU-IA-0068.