Data pre-processing
Objectives
1. Understand what data preprocessing is and why it is needed as part of an overall data science and machine learning methodology
2. Review and understand data quality issues and how to address them
3. Apply specific functions to assist in cleansing and transforming your data
4. Be able to summarize your data by using some statistics and data visualization
5. Be able to handle missing data and detect outliers
6. Be able to deal with high-dimensional data
General characterization
Code
200199
Credits
3.5
Responsible teacher
Docente a designar
Hours
Weekly - Available soon
Total - Available soon
Teaching language
Portuguese. If there are Erasmus students, classes will be taught in English
Prerequisites
Bibliography
- Garcia, S., Luengo, J., Herrera, F. (2015). Data Preprocessing in Data Mining, Springer.
Teaching method
The curricular unit is based on theoretical and practical lessons. A variety of instructional strategies will be applied, including lectures, slide show demonstrations, step-by-step applications (with and without software), questions and answers. The sessions include presentation of concepts and methodologies, solving examples, discussion and interpretation of results. The practical component is geared towards solving problems and exercises, including discussion and interpretation of results. A set of exercises to be completed independently in extra-classroom context is also proposed.
Evaluation method
Evaluation:
1st call: project (40%), first round exam (60%)
2nd call: final exam (100%)
Subject matter
1. What is data preprocessing?
2. What is dirty data?
3. Structuring Data
4. Overview of Data Cleansing
5. Data Quality. Data Quality Challenges
6. Raw Files and File Formats
7. Structured Data
8. Finding Data Sets
9. Missing Data
10. Outlier Detection
11. High-Dimensional Data
12. Feature Scaling
Programs
Programs where the course is taught:
- Specialization in Information Analysis and Management
- Specialization in Risk Analysis and Management
- Specialization in Knowledge Management and Business Intelligence
- Specialization in Information Systems and Technologies Management
- Specialization in Marketing Intelligence
- Specialization in Marketing Research and CRM
- Specialization in Knowledge Management and Business Intelligence – Working Hours Format
- Specialization in Information Systems and Technologies Management - Working Hours Format
- Specialization in Marketing Intelligence - Working Hours Format
- Post-Graduation in Information Analysis and Management
- Post-Graduation Risk Analysis and Management
- PostGraduate in Data Science for Marketing
- PostGraduate Digital Marketing and Analytics
- Post-Graduation in Knowledge Management and Business Intelligence
- Post-Graduation Information Systems and Technologies Management
- Post-Graduation in Marketing Intelligence
- Post-Graduation Marketing Research e CRM