Descriptive Methods of Data Mining
Objectives
At the end of the course, students should be able to:
- Discuss the most relevant ideas and concepts associated with data mining;
- Be able to execute basic and intermediate data preparation and pre-processing tasks;
- Describe with detail the hierarchical, k-means;
- Analyze and describe the results of presented by an U-Matrix;
- Create a segmentation, being able to explain the options used and explaining alternative, whenever available;
- Describe the apriori algorithm and the association rules are generated;
- Calculate and explain the most relevant performance measures of association rules.
- Caluate similarity between textual documents.
General characterization
Code
200165
Credits
7.5
Responsible teacher
Mauro Castelli
Hours
Weekly - Available soon
Total - Available soon
Teaching language
Portuguese. If there are Erasmus students, classes will be taught in English
Prerequisites
Familiarity with the main theme of the course is not required. But it is highly recommended that the students have knowledge of Inferential Statistics as well as good skills as a computer user.
Bibliography
Hand D. J., 1998, Data mining: statistics and more? The American Statistician, 52, 112--118. 11. Cap. 3, 6 e 8;
A. K. Jain, M.N. Murthy and P.J. Flynn, 1999 Data Clustering: A Review, ACM Computing Review.;
Han, J., Kamber, M. 2001, Data Mining - Concepts and Techniques, Morgan Kaufmann, San Francisco, California;
Berry, M.J.A. Linoff, G., 1997, Data Mining Techniques for marketing, sales and customer support. 2000, John Wiley & Sons. Cap. 1, 2, 3, 4, 5, 8 e 10;
Course Notes Enterprise MinerTM: Applying Data Mining Techniques
Teaching method
This course will include theoretical and practical classes for the development of the project.
Evaluation method
1ª Season - Exam (60%), Project (40%).
2ª Season - Exam (60%), Project (40%).
Subject matter
1.Intro to Data Mining
1.1.The data...
1.2.The data and organizations
1.3.The Data Mining promise
1.4.Data Mining definition
1.5.The managerial perspective
1.6.The fundamental Data Mining tasks
1.6.1.Knowledge discovery (Clustering e Summary)
1.6.2.Predictive Modeling (Classification e Regression)
1.7.Additional topics
1.7.1.Different types of learning
1.7.2.The curse of dimensionality
1.7.3.The separability problem
1.8.Application examples
2.Data mining methodological aspects
2.1.Problem definition
2.2.Data gathering
2.3.SEMMA methodology
3.Data visualization
3.1.The role of visualization
3.2.The Lie Factor
3.3.1d analysis tools
3.4.2d and 3d analysis tools
3.5.4d and more analysis tools
4.Data preparation and pre-processing
4.1.Noise vs signal
4.2.Missing data
4.3.Inconsistent data
4.4.Detection and removal of Outliers
4.5.Temporal Data
4.6.Data normalization
4.7.Dimensionality reduction
4.8.Data integration
4.9.Data transformation
4.10.Data discretization
5.Cluster analysis
5.1.Introduction
5.2.Choosing the variables
5.3.Similarity criteria
5.4.RFM Analysis
5.4.1.Exact quintiles
5.4.2.Hard coding
5.5.Clustering Algorithms
5.5.1.Hierarchical algorithm
5.5.2.K-means algorithm
5.6.The number of clusters
5.7.Analysis and profiling of the clustering solution
5.8.Validity of the solution
6.Self-Organizing Maps
6.1.SOM algorithm
6.2.Training parameters
6.3.Batch
6.4.Online
6.5.Analysis and profiling of the results
6.6.U-Matrices
7.Association rules
7.1.Objective
7.2.Types of rules
7.3.Apriori algorithm
7.4.Quality measures
7.5. Additional facts about the implementation
7.6.Temporal extension
Programs
Programs where the course is taught:
- Specialization in Information Analysis and Management
- Specialization in Risk Analysis and Management
- Specialization in Knowledge Management and Business Intelligence
- Specialization in Information Systems and Technologies Management
- Specialization in Marketing Intelligence
- Specialization in Marketing Research and CRM
- Specialization in Knowledge Management and Business Intelligence – Working Hours Format
- Specialization in Information Systems and Technologies Management - Working Hours Format
- Specialization in Marketing Intelligence - Working Hours Format
- Post-Graduation in Information Analysis and Management
- Post-Graduation Risk Analysis and Management
- PostGraduate in Smart Cities
- PostGraduate in Data Science for Marketing
- PostGraduate in Digital Enterprise Management
- PostGraduate Digital Marketing and Analytics
- PostGraduate in Information Management and Business Intelligence in Healthcare
- Post-Graduation in Knowledge Management and Business Intelligence
- Post-Graduation Information Systems and Technologies Management
- Post-Graduation in Marketing Intelligence
- Post-Graduation Marketing Research e CRM
- PostGraduate in Enterprise Information Systems