Data Preprocessing
Objectives
Although data preprocessing, in the context of data analysis/mining is a critical step and takes the vast majority of time and efforts in an analytics project, the fact is that data preprocessing is still often neglected. The data preprocessing is usually a process loosely controlled, resulting in out of range values, e.g., impossible data combinations (e.g., Gender: Male; Pregnant: Yes), missing values, outliers, among many others. Moreover, any empirical analysis, ranging from simple hypothesis testing to develop neural networks for predictive purposes, will only yield as good results as the quality of the data provided. This course aims to present the most important rationale and methods in data preprocessing as a critical requirement for successful analytic tasks, providing the students the basic knowledge for their future data analysis¿ efforts.
General characterization
Code
100222
Credits
4.0
Responsible teacher
Joana Paisana Pires Costa das Neves
Hours
Weekly - Available soon
Total - Available soon
Teaching language
Portuguese. If there are Erasmus students, classes will be taught in English
Prerequisites
N/A
Bibliography
- Linoff, Gordon & Berry, Michael. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management (2011).
- García, Salvador, Luengo, Julián & Herrera, Francisco. Data Preprocessing in Data Mining (2015).
- Hair, Black, Babin & Anderson. Multivariate Data Analysis (2014).
- Jonh W. Graham. Missing data: Analysis and Design (2012).
- Tamara Munzner. Visualization Analysis & Design (2014).
- Courses slides
Teaching method
The curricular unit is based on mix of theoretical lectures and practical classes. Each session will introduce new concepts and methodologies, as well as the applications of the learned concepts using different computational tools. Different learning strategies will be used, such as lectures, slide show demonstrations, step-by-step tutorials on how to approach practical examples, questions, and answers. The practical component is focused in exploring the different computational tools by the students, including a discussion on the best approach under different scenarios.
Evaluation method
1 st Term:
- Quiz (10%) - November 10th
- Group Project (35%) - delivery date: December 17th
- Exam (55%)
2 nd Term:
- Group Project (35%)
- Exam (65%)
Note:
- Quiz, exam and group project has a minimum grade of 8 out of twenty points;
Subject matter
PROGRAM
Chapter 1. Introduction to Data Preprocessing
Chapter 2. Introduction to Data Mining
Chapter 3. Building observations signatures (ABTs)
Chapter 4. Combining Datasets
Chapter 5. Overview of Data Mining methods
Chapter 6. Data Exploration and Outliers
Chapter 7. Handling missing values
Chapter 8. Data Transformation
Chapter 9. Handling sparseness
Chapter 10. Data Visualization
BIBLIOGRAPHY
References:
¿ Linoff, Gordon & Berry, Michael. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management (2011).
¿ García, Salvador, Luengo, Julián & Herrera, Francisco. Data Preprocessing in Data Mining (2015).
¿ Hair, Black, Babin & Anderson. Multivariate Data Analysis (2014).
¿ Jonh W. Graham. Missing data: Analysis and Design (2012).
¿ Tamara Munzner. Visualization Analysis & Design (2014).
¿ Course¿s slides.
SOFTWARE, PRACTICAL SESSIONS AND PROJECT
During the practical sessions we¿ll be using MS Excel, SAS Enterprise Guide and SAS Enterprise Miner, and PowerBI. It is important to note that the practical sessions don¿t exclude the need for the students to practice and use the software in their own time.
ASSESSMENT
The curricular unit is based on mix of theoretical lectures and practical classes. Each session will introduce new concepts and methodologies, as well as the applications of the learned concepts using different computational tools. Different learning strategies will be used, such as lectures, slide show demonstrations, step-by-step tutorials on how to approach practical examples, questions, and answers. The practical component is focused in exploring the different computational tools by the students, including a discussion on the best approach under different scenarios. Evaluation:
1 st Term:
- Quiz (10%) ¿ November 10th
- Group Project (35%) ¿ delivery date: December 17th
- Exam (55%)
2 nd Term:
- Group Project (35%)
- Exam (65%)
Note:
- Quiz, exam and group project has a minimum grade of 8 out of twenty points;
- Group projects have 4 members