Data pre-processing


1. Understand what data preprocessing is and why it is needed as part of an overall data science and machine learning methodology
2. Review and understand data quality issues and how to address them
3. Apply specific functions to assist in cleansing and transforming your data
4. Be able to summarize your data by using some statistics and data visualization
5. Be able to handle missing data and detect outliers
6. Be able to deal with high-dimensional data

General characterization





Responsible teacher

Docente a designar


Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English




- Garcia, S., Luengo, J., Herrera, F. (2015). Data Preprocessing in Data Mining, Springer.

Teaching method

The curricular unit is based on theoretical and practical lessons. A variety of instructional strategies will be applied, including lectures, slide show demonstrations, step-by-step applications (with and without software), questions and answers. The sessions include presentation of concepts and methodologies, solving examples, discussion and interpretation of results. The practical component is geared towards solving problems and exercises, including discussion and interpretation of results. A set of exercises to be completed independently in extra-classroom context is also proposed.

Evaluation method

1st call: project (40%), first round exam (60%)
2nd call: final exam (100%)

Subject matter

1. What is data preprocessing?
2. What is dirty data?
3. Structuring Data
4. Overview of Data Cleansing
5. Data Quality. Data Quality Challenges
6. Raw Files and File Formats
7. Structured Data
8. Finding Data Sets
9. Missing Data
10. Outlier Detection
11. High-Dimensional Data
12. Feature Scaling