Advanced Programming for Data Science and Engineering
- Understand and be able to develop the activities of processing and transformation of experimental data or sensors for later exploratory data analysis.
- Be able to express computations using an imperative model or functional operators.
- Know and know how to choose the most appropriate data visualizations for the intended data and analysis
- Understand the relational model and be able to express questions using relational operators to obtain data from a relational database.
- Understand the basic principles and algorithms of machine learning.
- Know and be able to express computations on complex and spatio-temporal data.
Carlos Augusto Isaac Piló Viegas Damásio, João Carlos Gomes Moura Pires
Weekly - 4
Total - 56
Previous knowledge of python language is advised.
The students lacking proficiency on the pyhton language may follow some of the online tutorials like the ones available at the official website of the python language (https://www.python.org)
For more information consult https://www.python.org/about/gettingstarted/
Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization, 2nd Edition
by Stefanie Molin
Packt | April, 2021
Reference books for pandas library:
Python for Data Analysis, 4th Edition, 2018.
by Wes McKinney
Publisher(s): O''Reilly Media, Inc.
The course will be supported by theoretical classes where the main topics to be addressed will be framed.
Theoretical teaching will draw on many examples of existing datasets to illustrate typical problems
encountered when dealing with real data. Good practices, solutions and computer methodologies to tackle these problems.
Labs will be fundamentally based on the python language and ecosystem for data analysis and visualization, one of the most used solutions by academia and industry. The python environment will be integrated with a set of external tools and services, illustrating a real data processing and processing environment.
The evaluation is formed by a theoretical component (50% of the grade) and a practical component (50% of the grade).
The theoretical component (CT) can be performed in:
- Continuous Assessment through two tests each worth 50% of the theoretical component (ie, 25% of the final grade);
- The theoretical component can be replaced by an oral one, with the students being indicated by the teachers
The practical component (CP) is carried out by a group project with 2 or 3 elements. The project is evaluated by report, demonstration and discussion of the work with the teachers. Group members can have different grades.
The final grade is obtained by the following formula: 50% CT + 50% CP with final rounding to units.
Theoretical component >= 9.5 values
Practical component >= 9.5 values
NOTE: Plagiarism implies automatic failure in the course.
Introduction to Data Analysis
- What are "data" and how we characterize them
- Univariate, bivariate and multivariate data analysis
- Exploratory Data Analysis
Create, read, view and select tabular data (pandas Series and DataFrames and SQL tables)
- Creation of tables and dataframes
- Readings of tabular data from files and the Web
- Tabular data filtering and projection operations
- Indexing and sorting
Processing of tabular data
- Data cleaning
- Formating data
- Null values, duplicates and incorrect or invalid data
Tabular data crossing
- Junction operations between tables
- Pivot and crossed tables (crosstabs)
- Summarize data
- Aggregation functions and operations
- Windows over the data
Time series and spatial data
- Typical operations for the treatment of time series
- Fundamentals of interactive data visualization
- Main data visualization tools for exploratory data analysis
Introduction to Machine Learning
- Overview of Machine Learning
- General techniques for supervised ML
- Auto regression
Programs where the course is taught: