Advanced Programming for Data Science and Engineering

Objectives

  • Understand and be able to develop the activities of processing and transformation of experimental data or sensors for later exploratory data analysis.
  • Be able to express computations using an imperative model or functional operators.
  • Know and know how to choose the most appropriate data visualizations for the intended data and analysis
  • Understand the relational model and be able to express questions using relational operators to obtain data from a relational database.
  • Understand the basic principles and algorithms of machine learning.
  • Know and be able to express computations on complex and spatio-temporal data.

General characterization

Code

12529

Credits

6.0

Responsible teacher

Carlos Augusto Isaac Piló Viegas Damásio

Hours

Weekly - 4

Total - 56

Teaching language

Português

Prerequisites

Previous knowledge of python language is advised.

The students lacking proficiency on the pyhton language may follow some of the online tutorials like the ones available at the official website of the python language (https://www.python.org)

For more information consult https://www.python.org/about/gettingstarted/

Bibliography

Adopted textbook:

Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization, 2nd Edition
by Stefanie Molin
Packt | April, 2021
ISBN-13: 978-1800563452

Reference books for pandas library:

Python for Data Analysis, 4th Edition, 2018.
by Wes McKinney
Publisher(s): O''''Reilly Media, Inc.
ISBN: 9781491957660 

Teaching method

The course will be supported by theoretical classes where the main topics to be addressed will be framed.

Theoretical teaching will draw on many examples of existing datasets to illustrate typical problems
encountered when dealing with real data. Good practices, solutions and computer methodologies to tackle these problems.

Labs will be fundamentally based on the python language and ecosystem for data analysis and visualization, one of the most used solutions by academia and industry. The python environment will be integrated with a set of external tools and services, illustrating a real data processing and processing environment.

Evaluation method

The evaluation is formed by a theoretical component (50% of the grade) and a practical component (50% of the grade).

The theoretical component (CT) can be performed in:

  • Continuous Assessment through two tests each worth 50% of the theoretical component (ie, 25% of the final grade);
  • Exam
  • The theoretical component can be replaced by an oral one, with the students being indicated by the teachers


The practical component (CP) is carried out by a group project with 2 or 3 elements. The project is evaluated by report, demonstration and discussion of the work with the teachers. Group members can have different grades.

The final grade is obtained by the following formula: 50% CT + 50% CP with final rounding to units.

Approval conditions:

Theoretical component >= 9.5 values
Practical component >= 9.5 values

NOTE: Plagiarism implies automatic failure in the course.

Subject matter

Introduction to Data Analysis

  • What are "data" and how we characterize them
  • Univariate, bivariate and multivariate data analysis
  • Exploratory Data Analysis

Create, read, view and select tabular data (pandas Series and DataFrames and SQL tables)

  • Creation of tables and dataframes
  • Readings of tabular data from files and the Web
  • Tabular data filtering and projection operations
  • Indexing and sorting

Processing of tabular data

  • Data cleaning
  • Formating data
  • Null values, duplicates and incorrect or invalid data

Tabular data crossing

  • Junction operations between tables
  • Pivot and crossed tables (crosstabs)

Aggregate data

  • Summarize data
  • Aggregation functions and operations
  • Windows over the data

Time series and spatial data

  • Typical operations for the treatment of time series

Data Visualization

  • Fundamentals of interactive data visualization
  • Main data visualization tools for exploratory data analysis

Introduction to Machine Learning

  • Overview of Machine Learning 
  • General techniques for supervised ML
  • Auto regression 
  • Classification 
  • Regression