Descriptive Methods of Data Mining

Objectives

Data Mining uses interdisciplinary techniques, such as statistics, data visualization, database systems, and machine learning to identify original, useful, and understandable patterns in data.
This course will familiarize students with Data Mining applications and Data Mining projects lifecycle. Students will learn techniques for understanding and preparing data before building descriptive models, such as clustering or association rules (e.g., market basket analysis).

General characterization

Code

200165

Credits

7.5

Responsible teacher

Roberto André Pereira Henriques

Hours

Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Prerequisites

Familiarity with the main theme of the course is not required. But it is highly recommended that the students have knowledge of Inferential Statistics as well as good skills as a computer user.

Bibliography

Keller, G. and Gaciu, N. (2020). Statistics for Management and Economics (2nd edition), Cengage Learning

Han, J., Kamber, M., Pei, J. (2012). Data Mining - Concepts and Techniques (Third edition), Morgan Kaufmann

Jain, A.K., Murthy, M.N.,  Flynn, P.J. (1999). Data Clustering: A Review, ACM Computing Review

Linoff, G. S., and Berry, M.J.A (2011). Data Mining Techniques for marketing, sales, and customer support (Third edition). Wiley Publishing, Inc.

SAS, Course Notes Enterprise MinerTM: Applying Data Mining Techniques (2014). Available from https://documents.pub/document/sas-notes-sas-enterprise-miner-software-applying-data-mining-techniques.html

 

Teaching method

The course is based on theoretical and practical classes. Several teaching strategies are applied, including slides presentation, step-by-step instructions on how to approach practical examples, and questions and answers. The practical component is oriented towards the exploration of the tools introduced to students (Microsoft Excel and SAS Enterprise Miner) and the development of the project. 

Evaluation method

1st Season: Exam (60%), Project (40%)

2nd Season: Exam (60%), Project (40%)

 

Rules:

  • Minimum grade in both the exam and the term project is 8.0 (out of 20)
  • Projects not submitted in Moodle until the deadline will be rejected

Subject matter

LU1. Introduction to Data Mining

LU2. Methodological aspects (KDD, SEMMA, CRISP-DM)

LU3. Data visualization

LU4. Data understanding

LU5. Data preparation

LU6. Clustering

LU7. Self-Organizing maps

LU8. RFM model

LU9. Association rules and the Apriori algorithm

LU10. Data similarity and dissimilarity measures