Administrative Information
Title | Data Preparation and Exploration |
Duration | 60 |
Module | A |
Lesson Type | Lecture |
Focus | Practical - AI Modelling |
Topic | Data preparation methods |
Keywords
Data Preparation,Data Cleaning,Data Transformation,Data Normalization,Data Integration,Data Reduction,
Learning Goals
- To be able to chose the most suited data preparation method based on the case
- prepare data in practice (handle missing values, create new derived features)
- Data enrichment
- Ethical: anonymisation and problems with this (identification possible in indirect ways) - again, there should be some examples out there
- Imputation – mention that it can introduce bias and that this needs to be kept in mind
- New feature creation – loss of proper semantics
- Ethical: remove bias from the dataset
- Parallels and differences between sampling of data in statistics and acquisition of data (including big data) for ML and AI
Expected Preparation
Learning Events to be Completed Before
Obligatory for Students
- N/A
Optional for Students
- N/A
References and background for students
- N/A
Recommended for Teachers
Lesson materials
Instructions for Teachers
You can base this class around the slides.
Outline
Duration (min) | Description | Concepts | |
---|---|---|---|
5 | Outline | Data preparation methods: what's the point? | |
5 | Problems / Preprocessing | What problems can the data have, cleaning, purification | |
5 | Data Preparation | Cleaning, transformation, integration, normalization, imputation, noise identification | |
5 | Data Preparation in detail | Forms of data preparation | |
10 | Data Cleaning in detail | Fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset | |
10 | Data Transformation in detail | Converting data from one format to another, best practices. | |
5 | Data Normalization in detail | Data normalization best practices. | |
5 | Data Integration in detail | Data integration best practices. | |
5 | Data Reduction in detail | Data Reduction best practices. | |
10 | Data preparation in practice | Filtering, missing values, duplicates, | |
5 | Concluding remarks | Emphasizing the importance of data preparation. |
Acknowledgements
The Human-Centered AI Masters programme was Co-Financed by the Connecting Europe Facility of the European Union Under Grant №CEF-TC-2020-1 Digital Skills 2020-EU-IA-0068.