Data Mining I
At the end of the course, students should be able to:
- Discuss the most relevant ideas and concepts associated with data mining;
- Be able to execute basic and intermediate data preparation and pre-processing tasks;
- Describe the principles and execute an RFM analysis;
- Describe with detail the hierarchical, k-means and self-organizing map algorithms;
- Analyze and describe the results of presented by an U-Matrix;
- Create a segmentation, being able to explain the options used and explaining alternative, whenever available;
- Describe the apriori algorithm and the association rules are generated;
- Calculate and explain the most relevant performance measures of association rules.
Weekly - Available soon
Total - Available soon
Portuguese. If there are Erasmus students, classes will be taught in English
Familiarity with the main theme of the course is not required. But it is highly recommended that the students have knowledge of Inferential Statistics as well as good skills as a computer user.
Hand D. J., 1998, Data mining: statistics and more? The American Statistician, 52, 112--118. 11. Cap. 3, 6 e 8; A. K. Jain, M.N. Murthy and P.J. Flynn, 1999 Data Clustering: A Review, ACM Computing Review.; Han, J., Kamber, M. 2001, Data Mining ¿ Concepts and Techniques, Morgan Kaufmann, San Francisco, California; Berry, M.J.A. Linoff, G., 1997, Data Mining Techniques for marketing, sales and customer support. 2000, John Wiley & Sons. Cap. 1, 2, 3, 4, 5, 8 e 10; Course Notes Enterprise MinerTM: Applying Data Mining Techniques
This course will include theoretical and practical classes for the development of the project.
1ª Season - Exam (65%), Research Project (35%).
2ª Season - Exam (60%), Research Project (35%), Application Project (25%).
1.Intro to Data Mining 1.1.The data...
1.2.The data and organizations
1.3.The Data Mining promise
1.4.Data Mining definition
1.5.The managerial perspective
1.6.The fundamental Data Mining tasks
1.6.1.Knowledge discovery (Clustering e Summary)
1.6.2.Predictive Modeling (Classification e Regression)
1.7.1.Different types of learning
1.7.2.The curse of dimensionality
1.7.3.The separability problem
2.Data mining methodological aspects
3.1.The role of visualization
3.2.The Lie Factor
3.3.1d analysis tools
3.4.2d and 3d analysis tools
3.5.4d and more analysis tools
4.Data preparation and pre-processing
4.1.Noise vs signal
4.4.Detection and removal of Outliers
5.2.Choosing the variables
5.6.The number of clusters
5.7.Analysis and profiling of the clustering solution
5.8.Validity of the solution
6.5.Analysis and profiling of the results
7.2.Types of rules
7.5. Additional facts about the implementation