[this page on wiki][index][EN][BG][CS][DA][DE][EL][ES][ET][FI][FR][GA][HR][HU][IT][MT][NL][PL][PT][RO][SK][SL][SV]

Lecture: Transformer networks

Administrative Information

Title Transformer networks
Duration 60 minutes
Module B
Lesson Type Lecture
Focus Technical - Deep Learning
Topic Transformer


sequence-to-sequence learning, seq2seq, attention mechanism, self-attention mechanism, transformer network,

Learning Goals

Expected Preparation

Learning Events to be Completed Before

Optional for Students


References and background for students


Recommended for Teachers


Lesson materials

Instructions for Teachers

In the lecture first we just briefly repeat what we learned about sequential data previously (e.g. in the RNN lecture). Then we discuss, that we will learn about three main concepts today: sequence-to-sequence models, attention mechanism and transformer. The first two are needed to understand the concept of the transformer. You can prepare the original papers and show them to the attendees.

Seq2seq: we just briefly discuss the main concepts. The difference between the teacher forcing (training) and instance-by-instance (inferance) should be emphasized.

The source codes should be discussed in details, line-by-line, so the concept can be understood by the students in a code level.

In the second half of the lecture the transformer architecture is introduced. The core elements are discussed seperately.

If you have some time left at the end of the lecture, you can open the TensorFlow tutorial on transformer (link on this page and in the slides too).


Time schedule
Duration (Min) Description
5 Sequential data introduction
7.5 Sequence-to-sequence models
7.5 Attention mechanism
15 Source codes
20 Transformer
5 Summary and conclusions


Balint Gyires-Tóth (Budapest University of Technology and Economics)

The Human-Centered AI Masters programme was Co-Financed by the Connecting Europe Facility of the European Union Under Grant №CEF-TC-2020-1 Digital Skills 2020-EU-IA-0068.