Advanced Data Analysis
Objectives
This course provides an introduction to advanced data analysis techniques, combining both the introduction to the algorithms for processing data
and the distributed execution of such algorithms. Within this course, the students will acquire knowledge and competences for performing
advanced data analysis, including selecting the appropriate tools and algorithms.
General characterization
Code
67890
Credits
3.5
Responsible teacher
Nuno Manuel Ribeiro Preguiça
Hours
Weekly - Available soon
Total - Available soon
Teaching language
English
Prerequisites
N/A
Bibliography
Moreira, João, Andre Carvalho, and Tomás Horvath. A General Introduction to Data Analytics. John Wiley & Sons, 2018.
Teaching method
Lectures will cover the fundamental topics of the course, illustrated with relevant real-world data analysis problems and coding examples. The
lectures will include some time for questions and discussion.
Real datasets will be provided to students and will be used systematically as examples and training scenarios. Students are expected to practice
and solve the proposed exercises autonomously, but part of the contact time will be devoted to discussing any practical problems they were
unable to solve on their own.
Evaluation method
The evaluation of this curricular unit will consist of small hands-on quizzes/assignment (25% of the final grade) and a larger group project (25%
of the final grade), in which the student put to practice the techniques introduced in the lectures; and a midterm test (20% of the final grade) and a final exam (30% of the final grade).
Regular Exam Period
4 quizzes/small assignments (25%)
a practical team work assignment (25%);
midterm test (20%);
final exam (30%).
Subject matter
A. Introduction to Big Data
Challenges
Data analytics models
Applicability
B. Generic processing frameworks
Programming models
Processing framework
C. Programming systems for data analysis
Big Data infrastructures: e.g. Azure HDInsight
Models and programming environments: e.g. Jupyter.
D. Data cleaning
Pre-processing
Rescaling
Data quality
E. Dealing with multidimensional data
Descriptive statistics and visualization
Feature selection and extraction for dimensionality reduction
F. Clustering
Clustering types
Distance measures
G. Clustering validation
Advanced processing systems
Domain-specific systems: graph processing and machine learning
Realtime processing