Advanced Data Analysis

Objectives

This course provides an introduction to advanced data analysis techniques, combining both the introduction to the algorithms for processing data
and the distributed execution of such algorithms. Within this course, the students will acquire knowledge and competences for performing
advanced data analysis, including selecting the appropriate tools and algorithms.

General characterization

Code

2597

Credits

3.5

Responsible teacher

Nuno Ribeiro Preguiça | Ricardo Almeida E Silva

Hours

Weekly - Available soon

Total - Available soon

Teaching language

English

Prerequisites

n/a 


Bibliography

Moreira, João, Andre Carvalho, and Tomás Horvath. A General Introduction to Data Analytics. John Wiley & Sons, 2018.

Teaching method

Lectures will cover the fundamental topics of the course, illustrated with relevant real-world data analysis problems and coding examples. The
lectures will include some time for questions and discussion.
Real datasets will be provided to students and will be used systematically as examples and training scenarios. Students are expected to practice
and solve the proposed exercises autonomously, but part of the contact time will be devoted to discussing any practical problems they were
unable to solve on their own.

Evaluation method

Regular Exam Period:

4 quizzes/small assignments (25%)
practical team work assignment (25%);
midterm test (20%);
final exam (30%).

Subject matter

A. Introduction to Big Data
Challenges
Data analytics models
Applicability
B. Generic processing frameworks
Programming models
Processing framework
C. Programming systems for data analysis
Big Data infrastructures: e.g. Azure HDInsight
Models and programming environments: e.g. Jupyter.
D. Data cleaning
Pre-processing
Rescaling
Data quality
E. Dealing with multidimensional data
Descriptive statistics and visualization
Feature selection and extraction for dimensionality reduction
F. Clustering
Clustering types
Distance measures
G. Clustering validation
Advanced processing systems
Domain-specific systems: graph processing and machine learning
Realtime processing

Programs

Programs where the course is taught: