Programming for Data Science and Engineering
Objectives
At the end of this course, at the first cycle level, will have acquired knowledge, skills and competences that will allow him to:
- Understand and be able to develop the activities of processing and transforming experimental data or sensors for further exploratory data analysis.
- Understand the relational model and be able to express questions using relational operators to obtain data from a relational database.
- Understand the challenges associated with processing large amounts of data.
- Be able to express computations using an imperative model or functional operators.
- Know and be able to express computations about complex and spatio-temporal data.
- Know and know how to choose the data visualizations that are most appropriate to the data and analyzes required.
General characterization
Code
12570
Credits
6.0
Responsible teacher
Carlos Augusto Isaac Piló Viegas Damásio, João Carlos Gomes Moura Pires
Hours
Weekly - 4
Total - 56
Teaching language
Português
Prerequisites
Previous knowledge of python language is advised.
The students lacking proficiency on the pyhton language may follow some of the online tutorials like the ones available at the official website of the python language (https://www.python.org)
For more information consult https://www.python.org/about/gettingstarted/
Bibliography
Adopted textbook:
Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization, 2nd Edition (Chapters 1 to 8)
by Stefanie Molin
Packt | April, 2021
ISBN-13: 978-1800563452
Complementary textbook:
A General Introduction to Data Analytics
by João Moreira, Andre Carvalho, et al.
JOHN WILEY AND SONS INC | Jun 25, 2018
(Chapters 1 to 4)
Reference books for pandas library:
Python for Data Analysis, 2nd Edition, 2017.
by Wes McKinney
Publisher(s): O''''Reilly Media, Inc.
ISBN: 9781491957660
Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization (Treading on Python Book 3), 2016.
by Matt Harrison, Michael Prentiss
Teaching method
The course will be supported by theoretical classes where the main topics to be addressed will be framed.
Theoretical teaching will draw on many examples of existing datasets to illustrate typical problems
encountered when dealing with real data. Good practices, solutions and computer methodologies to tackle these problems.
Labs will be fundamentally based on the python language and ecosystem for data analysis and visualization, one of the most used solutions by academia and industry. The python environment will be integrated with a set of external tools and services, illustrating a real data processing and processing environment.
Evaluation method
The evaluation is formed by a theoretical component (50% of the grade) and a practical component (50% of the grade).
The theoretical component (CT) can be performed in:
- Continuous Assessment through two tests each worth 50% of the theoretical component (ie, 25% of the final grade);
- Exam
- The theoretical component can be replaced by an oral one, with the students being indicated by the teachers
The practical component (CP) is carried out by a group project with 3 elements. The project is evaluated by report, demonstration and discussion of the work with the teachers. Group members can have different grades.
The final grade is obtained by the following formula: 50% CT + 50% CP with final rounding to units.
Approval conditions:
Theoretical component >= 9.5 values
Practical component >= 9.5 values
NOTE: Plagiarism implies automatic failure in the course.
Subject matter
1. Introduction to Programming for Data Analysis:
a) Data Science
b) CRISP-DM Methodology
2. Software structuring and organization.
a) Modules and API Usage
b) Functional data processing (map, flatmap, reduce, etc. operators).
c) Program deployment models (e.g. libraries, Jupyter Notebooks)
3. Data processing and querying.
a) Methods for data access. Spatio-temporal and complex data.
b) Relational Data Query Language: SQL. Projections, selections, joins and aggregations.
c) Manipulaton of data series and tabular data using pandas
4. Data Visualization.
a) Fundamentals of interactive data visualization
b) Main data visualization tools for exploratory data analysis
c) Using python libraries for data visualization and small interactive dashboard design.
5. Scalability and Cloud Services.
a) Challenges and approaches
b) Parallel computing frameworks (e.g. Spark)
Programs
Programs where the course is taught: