Data Mining
Objetivos
The Data Mining course aims to study the main methods and tools available in data mining (knowledge discovery in databases), more specifically the subset of tools which are usually called descriptive models (or unsupervised learning). The course does not assume familiarity of the student with the theme, but it is highly recommended that the students have knowledge of inferential statistics, as well as minimal computer skills.
The course seeks to achieve a balance between courses dedicated to in-depth analysis of the algorithms and the courses for managers that seek to raise awareness about the importance of the tools. This is a technical course for all, those who already work or want to work in developing descriptive models and exploring big databases. As such, students will perform the activities of a typical data scientist, especially in the project work, which constitutes a central component of the course.
The main focus in this course is to present the algorithms in a clear and comprehensible way to a wide audience with different academic backgrounds. It is intended to enable the student to understand the fundamentals associated with the inner workings of the different algorithms, because only then the student will be able to apply them judiciously.
The course program covers the main methodological aspects, data preparation and preprocessing tasks as well as the most popular descriptive models, including different clustering algorithms and association rules, among others. The aim is also to provide students the opportunity to learn/use Python to implement and apply these algorithms in real world applications.
Caracterização geral
Código
200175
Créditos
7.5
Professor responsável
João Pedro Martins Ribeiro da Fonseca
Horas
Semanais - A disponibilizar brevemente
Totais - A disponibilizar brevemente
Idioma de ensino
Português. No caso de existirem alunos de Erasmus, as aulas serão leccionadas em Inglês
Pré-requisitos
Familiarity with the main theme of the course is not required. But it is highly recommended that the students have knowledge of Inferential Statistics as well as good skills as a computer user. Students without previous training or experience with Python should complete the two following Datacamp online courses before the third week of this course (first practical class): Introduction to Python and Intermediate Python. Students who wish could also complete, optionally, the course Data manipulation with pandas. The instructor will provide information on how to have free access to the Datacamp platform.
Bibliografia
Método de ensino
The course is based on theoretical and practical classes. Several teaching strategies are applied, including slides presentation, step-by-step instructions on approaching practical examples, and questions and answers. The practical component is oriented towards exploring the tools introduced to students (Microsoft Excel and Python) and the development of the project. Applications used: Microsoft Excel, Python, Jupyter notebook, Microsoft visual studio code.
Método de avaliação
1ª Session – Exam (65%), Project (35%)
2ª Session – Exam (65%), Project (35%)
Both components of the evaluation (project and exam) are mandatory. There are two opportunities to do the exam. Any delay in the delivery of the project is subject to a penalty of 10% of the grade for each day of delay. Please note that the project will be developed in groups, but each group cannot have more than 3 elements. To obtain approval in the discipline the student cannot have less than 8 (40%) ¿¿in the exam grade.
Conteúdo
LU01. Introduction to Data Science
LU02. The canonical tasks in Data Mining and work process
LU03. Exploratory Data Analysis
LU04. Data Preparation and Preprocessing
LU05. RFM analysis
LU06. Hierarchical algorithms
LU07. Partitional algorithms (k-means and k-medoids)
LU08. Density-based algorithms (DBSCAN and Mean-Shift)
LU09. Self-organizing maps
LU10. Semi-Supervised Classification
LU11. Multidimensional Visualization Methods
LU12. Association Rules
Cursos
Cursos onde a unidade curricular é leccionada: