Descriptive Methods of Data Mining

Objectives

At the end of the course, students should be able to:

- Discuss the most relevant ideas and concepts associated with data mining;

- Be able to execute basic and intermediate data preparation and pre-processing tasks;

- Describe with detail the hierarchical, k-means;

- Analyze and describe the results of presented by an U-Matrix;

- Create a segmentation, being able to explain the options used and explaining alternative, whenever available;

- Describe the apriori algorithm and the association rules are generated;

- Calculate and explain the most relevant performance measures of association rules.

- Caluate similarity between textual documents.

General characterization

Code

200165

Credits

7.5

Responsible teacher

Mauro Castelli

Hours

Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Prerequisites

Familiarity with the main theme of the course is not required. But it is highly recommended that the students have knowledge of Inferential Statistics as well as good skills as a computer user.

Bibliography

Hand D. J., 1998, Data mining: statistics and more? The American Statistician, 52, 112--118. 11. Cap. 3, 6 e 8;

A. K. Jain, M.N. Murthy and P.J. Flynn, 1999 Data Clustering: A Review, ACM Computing Review.;

Han, J., Kamber, M. 2001, Data Mining - Concepts and Techniques, Morgan Kaufmann, San Francisco, California;

Berry, M.J.A. Linoff, G., 1997, Data Mining Techniques for marketing, sales and customer support. 2000, John Wiley & Sons. Cap. 1, 2, 3, 4, 5, 8 e 10;

Course Notes Enterprise MinerTM: Applying Data Mining Techniques

Teaching method

This course will include theoretical and practical classes for the development of the project.

Evaluation method

1ª Season - Exam (60%), Project (40%).

2ª Season - Exam (60%), Project (40%).

Subject matter

1.Intro to Data Mining

1.1.The data...

1.2.The data and organizations

1.3.The Data Mining promise

1.4.Data Mining definition

1.5.The managerial perspective

1.6.The fundamental Data Mining tasks

1.6.1.Knowledge discovery (Clustering e Summary)

1.6.2.Predictive Modeling (Classification e Regression)

1.7.Additional topics

1.7.1.Different types of learning

1.7.2.The curse of dimensionality

1.7.3.The separability problem

1.8.Application examples

2.Data mining methodological aspects

2.1.Problem definition

2.2.Data gathering

2.3.SEMMA methodology

3.Data visualization

3.1.The role of visualization

3.2.The Lie Factor

3.3.1d analysis tools

3.4.2d and 3d analysis tools

3.5.4d and more analysis tools

4.Data preparation and pre-processing

4.1.Noise vs signal

4.2.Missing data

4.3.Inconsistent data

4.4.Detection and removal of Outliers

4.5.Temporal Data

4.6.Data normalization

4.7.Dimensionality reduction

4.8.Data integration

4.9.Data transformation

4.10.Data discretization

5.Cluster analysis

5.1.Introduction

5.2.Choosing the variables

5.3.Similarity criteria

5.4.RFM Analysis

5.4.1.Exact quintiles

5.4.2.Hard coding

5.5.Clustering Algorithms

5.5.1.Hierarchical algorithm

5.5.2.K-means algorithm

5.6.The number of clusters

5.7.Analysis and profiling of the clustering solution

5.8.Validity of the solution

6.Self-Organizing Maps

6.1.SOM algorithm

6.2.Training parameters

6.3.Batch

6.4.Online

6.5.Analysis and profiling of the results

6.6.U-Matrices

7.Association rules

7.1.Objective

7.2.Types of rules

7.3.Apriori algorithm

7.4.Quality measures

7.5. Additional facts about the implementation

7.6.Temporal extension