Advanced Programming for Data Science and Engineering
Objectives
- Understand and be able to develop the activities of processing and transformation of experimental data or sensors for later exploratory data analysis.
- Be able to express computations using an imperative model or functional operators.
- Know and know how to choose the most appropriate data visualizations for the intended data and analysis
- Understand the relational model and be able to express questions using relational operators to obtain data from a relational database.
- Understand the basic principles and algorithms of machine learning.
- Know and be able to express computations on complex and spatio-temporal data.
General characterization
Code
12529
Credits
6.0
Responsible teacher
Carlos Augusto Isaac Piló Viegas Damásio, João Carlos Gomes Moura Pires
Hours
Weekly - 4
Total - 56
Teaching language
Inglês
Prerequisites
Previous knowledge of python language is advised.
The students lacking proficiency on the pyhton language may follow some of the online tutorials like the ones available at the official website of the python language (https://www.python.org)
For more information consult https://www.python.org/about/gettingstarted/
Bibliography
Adopted textbook:
Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization, 2nd Edition
by Stefanie Molin
Packt | April, 2021
ISBN-13: 978-1800563452
Reference books for pandas library:
Python for Data Analysis, 4th Edition, 2018.
by Wes McKinney
Publisher(s): O''Reilly Media, Inc.
ISBN: 9781491957660
Teaching method
The course will be supported by theoretical classes where the main topics to be addressed will be framed.
Theoretical teaching will draw on many examples of existing datasets to illustrate typical problems
encountered when dealing with real data. Good practices, solutions and computer methodologies to tackle these problems.
Labs will be fundamentally based on the python language and ecosystem for data analysis and visualization, one of the most used solutions by academia and industry. The python environment will be integrated with a set of external tools and services, illustrating a real data processing and processing environment.
Evaluation method
The evaluation is formed by a theoretical component (50% of the grade) and a practical component (50% of the grade).
The theoretical component (CT) can be performed in:
- Continuous Assessment through two tests each worth 50% of the theoretical component (ie, 25% of the final grade);
- Exam
- The theoretical component can be replaced by an oral one, with the students being indicated by the teachers
The practical component (CP) is carried out by a group project with 2 or 3 elements. The project is evaluated by report, demonstration and discussion of the work with the teachers. Group members can have different grades.
The final grade is obtained by the following formula: 50% CT + 50% CP with final rounding to units.
Approval conditions:
Theoretical component >= 9.5 values
Practical component >= 9.5 values
NOTE: Plagiarism implies automatic failure in the course.
Subject matter
Introduction to Data Analysis
- What are "data" and how we characterize them
- Univariate, bivariate and multivariate data analysis
- Exploratory Data Analysis
Create, read, view and select tabular data (pandas Series and DataFrames and SQL tables)
- Creation of tables and dataframes
- Readings of tabular data from files and the Web
- Tabular data filtering and projection operations
- Indexing and sorting
Processing of tabular data
- Data cleaning
- Formating data
- Null values, duplicates and incorrect or invalid data
Tabular data crossing
- Junction operations between tables
- Pivot and crossed tables (crosstabs)
Aggregate data
- Summarize data
- Aggregation functions and operations
- Windows over the data
Time series and spatial data
- Typical operations for the treatment of time series
Data Visualization
- Fundamentals of interactive data visualization
- Main data visualization tools for exploratory data analysis
Introduction to Machine Learning
- Overview of Machine Learning
- General techniques for supervised ML
- Auto regression
- Classification
- Regression
Programs
Programs where the course is taught: