Data Mining I
Objectives
At the end of the course, students should be able to:
- Discuss the most relevant ideas and concepts associated with data mining;
- Be able to execute basic and intermediate data preparation and pre-processing tasks;
- Describe the principles and execute an RFM analysis;
- Describe with detail the hierarchical, k-means and self-organizing map algorithms;
- Analyze and describe the results of presented by an U-Matrix;
- Create a segmentation, being able to explain the options used and explaining alternative, whenever available;
- Describe the apriori algorithm and the association rules are generated;
- Calculate and explain the most relevant performance measures of association rules.
General characterization
Code
200026
Credits
7.5
Responsible teacher
Mauro Castelli
Hours
Weekly - Available soon
Total - Available soon
Teaching language
Portuguese. If there are Erasmus students, classes will be taught in English
Prerequisites
Familiarity with the main theme of the course is not required. But it is highly recommended that the students have knowledge of Inferential Statistics as well as good skills as a computer user.
Bibliography
Hand D. J., 1998, Data mining: statistics and more? The American Statistician, 52, 112--118. 11. Cap. 3, 6 e 8; A. K. Jain, M.N. Murthy and P.J. Flynn, 1999 Data Clustering: A Review, ACM Computing Review.; Han, J., Kamber, M. 2001, Data Mining ¿ Concepts and Techniques, Morgan Kaufmann, San Francisco, California; Berry, M.J.A. Linoff, G., 1997, Data Mining Techniques for marketing, sales and customer support. 2000, John Wiley & Sons. Cap. 1, 2, 3, 4, 5, 8 e 10; Course Notes Enterprise MinerTM: Applying Data Mining Techniques
Teaching method
This course will include theoretical and practical classes for the development of the project.
Evaluation method
1ª Season - Exam (65%), Research Project (35%).
2ª Season - Exam (60%), Research Project (35%), Application Project (25%).
Subject matter
1.Intro to Data Mining 1.1.The data...
1.2.The data and organizations
1.3.The Data Mining promise
1.4.Data Mining definition
1.5.The managerial perspective
1.6.The fundamental Data Mining tasks
1.6.1.Knowledge discovery (Clustering e Summary)
1.6.2.Predictive Modeling (Classification e Regression)
1.7.Additional topics
1.7.1.Different types of learning
1.7.2.The curse of dimensionality
1.7.3.The separability problem
1.8.Application examples
2.Data mining methodological aspects
2.1.Problem definition
2.2.Data gathering
2.3.SEMMA methodology
3.Data visualization
3.1.The role of visualization
3.2.The Lie Factor
3.3.1d analysis tools
3.4.2d and 3d analysis tools
3.5.4d and more analysis tools
4.Data preparation and pre-processing
4.1.Noise vs signal
4.2.Missing data
4.3.Inconsistent data
4.4.Detection and removal of Outliers
4.5.Temporal Data
4.6.Data normalization
4.7.Dimensionality reduction
4.8.Data integration
4.9.Data transformation
4.10.Data discretization
5.Cluster analysis
5.1.Introduction
5.2.Choosing the variables
5.3.Similarity criteria
5.4.RFM Analysis
5.4.1.Exact quintiles
5.4.2.Hard coding
5.5.Clustering Algorithms
5.5.1.Hierarchical algorithm
5.5.2.K-means algorithm
5.6.The number of clusters
5.7.Analysis and profiling of the clustering solution
5.8.Validity of the solution
6.Self-Organizing Maps
6.1.SOM algorithm
6.2.Training parameters
6.3.Batch
6.4.Online
6.5.Analysis and profiling of the results
6.6.U-Matrices
7.Association rules
7.1.Objective
7.2.Types of rules
7.3.Apriori algorithm
7.4.Quality measures
7.5. Additional facts about the implementation
7.6.Temporal extension