Advanced Data Analysis


This course provides an introduction to advanced data analysis techniques, combining both the introduction to the algorithms for processing data
and the distributed execution of such algorithms. Within this course, the students will acquire knowledge and competences for performing
advanced data analysis, including selecting the appropriate tools and algorithms.

General characterization





Responsible teacher

Nuno Manuel Ribeiro Preguiça


Weekly - Available soon

Total - Available soon

Teaching language





Moreira, João, Andre Carvalho, and Tomás Horvath. A General Introduction to Data Analytics. John Wiley & Sons, 2018.

Teaching method

Lectures will cover the fundamental topics of the course, illustrated with relevant real-world data analysis problems and coding examples. The
lectures will include some time for questions and discussion.
Real datasets will be provided to students and will be used systematically as examples and training scenarios. Students are expected to practice
and solve the proposed exercises autonomously, but part of the contact time will be devoted to discussing any practical problems they were
unable to solve on their own.

Evaluation method

The evaluation of this curricular unit will consist of small hands-on quizzes/assignment (25% of the final grade) and a larger group project (25%
of the final grade), in which the student put to practice the techniques introduced in the lectures; and a midterm test (20% of the final grade) and a final exam (30% of the final grade).

Regular Exam Period
4 quizzes/small assignments (25%)
a practical team work assignment (25%);
midterm test (20%);
final exam (30%).

Subject matter

A. Introduction to Big Data
Data analytics models
B. Generic processing frameworks
Programming models
Processing framework
C. Programming systems for data analysis
Big Data infrastructures: e.g. Azure HDInsight
Models and programming environments: e.g. Jupyter.
D. Data cleaning
Data quality
E. Dealing with multidimensional data
Descriptive statistics and visualization
Feature selection and extraction for dimensionality reduction
F. Clustering
Clustering types
Distance measures
G. Clustering validation
Advanced processing systems
Domain-specific systems: graph processing and machine learning
Realtime processing


Programs where the course is taught: