Lecture: Derivation and application of backpropagation

Administrative Information

Title	Derivation and application of backpropagation
Duration	60
Module	B
Lesson Type	Lecture
Focus	Technical - Deep Learning
Topic	Deriving and Implementing Backpropagation

Keywords

Backpropagation, activation functions, dieivation,

Learning Goals

Develop an understanding of gradient and learning rate
Derive backpropagation for hidden and outer layers
Implimenting Backpropagation unplugged and plugged using different activation functions

Expected Preparation

Learning Events to be Completed Before

Lecture: Forward propagation

Obligatory for Students

Calculus revision (derivatives, partial derivatives, the Chain rule)

Optional for Students

None.

References and background for students

John D Kelleher and Brain McNamee. (2018), Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press.
Michael Nielsen. (2015), Neural Networks and Deep Learning, 1. Determination press, San Francisco CA USA.
Charu C. Aggarwal. (2018), Neural Networks and Deep Learning, 1. Springer
Antonio Gulli,Sujit Pal. Deep Learning with Keras, Packt, [ISBN: 9781787128422].

Recommended for Teachers

None.

Lesson materials

Instructions for Teachers

This lecture will introduce students to the fundamentals of the backpropagation algorithm. This lecture will start with the notion of the curse of dimensionality leading to the need of a heuristic approach - followed by the overview of how gradient can be used to adjust the weights. This then introduces the backpropagation algorithm. We then also introduce the hyperparameter of learning rate and a brief over view of the affect of large and small values (this will be expanded in Lecture 3). Then using the same introductory network from Lecture 1, we derive the outer layer backpropagation formula, and then finally, we will derive the inner layer backpropagation algorithm. This lecture concludes with examples of different activation functions, and how the algorithm can be applied. The corresponding tutorial will include additional pen and paper derivations, practical examples and the use of code (just Numpy and the KERAS) to implement the backpropagation algorithm.

The initial concept of brute force weight selection, and the curse of dimensionality
Introduction to gradient and how this address the problem of iterative, heuristic weight adjustments
Why learning rate is needed and the affects of choosing small and large values
Deriving the gradient (thus the backpropagation algorithm) for the output layer with Sigmoid as the outer activation function
Deriving the gradient (thus the backpropagation algorithm) for the hiden layer with Sigmoid as the outer activation function
Presenting the final backpropagation formula
Using different activation functions (Outer Layer: Linear, Sigmoid and Softmax; Hidden layer: ReLu, SIgmoid and TanH) in the backpropagation algorithm

Outline

Time schedule
Duration (Min)	Description
5	Introduction to learning, gradient and learning rate
20	Derivation of the backpropagation algorithm for the outer layer (Sigmoid)
20	Derivation of the backpropagation algorithm for the hidden layer (Sigmoid)
10	Implementing the backpropagation algorithm and the use of different activation functions for each layer
5	Recap on the backpropagation algorithm

Acknowledgements

The Human-Centered AI Masters programme was Co-Financed by the Connecting Europe Facility of the European Union Under Grant №CEF-TC-2020-1 Digital Skills 2020-EU-IA-0068.

Lesson plan on SURF

WikiWijs page