Programming for Data Science

Objectives

The Programming for Data Science curricular unit is aimed at students without prior programming experience. In this unit, students will learn the fundamentals of programming in Python necessary for a successful career in data science. Starting from the very basics of programming, we will rapidly evolve towards advanced computing techniques and concepts of interest for the development of a data science project. During the Programming for Data Science curricular unit, students will acquire experience working with the backbone stack of libraries (Pandas, Numpy, Scipy, Seaborn, Scikit-learn, Statsmodels, NetworkX) that make Python the language of choice among data scientists.

At the end of the curricular unit, students are expected to have the capacity to use programming to develop a data science project independently and to feel comfortable with the programming activities in other curricular units. The curricular unit has a strong, active learning component, and, as such, students are expected to participate during classes and read the recommended weekly materials.

General characterization

Code

400090

Credits

7.5

Responsible teacher

Flávio Luís Portas Pinheiro

Hours

Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Prerequisites

None

Bibliography

Lubanovic, Bill. Introducing Python: modern computing in simple packages. " O'Reilly Media, Inc.", 2014;

VanderPlas, Jake. Python data science handbook: essential tools for working with data. " O'Reilly Media, Inc.", 2016.

McKinney, Wes. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.", 2012.

Grus, Joel. Data science from scratch: first principles with python. " O'Reilly Media, Inc.", 2015

Additionally, students will be able to find a rich online documentation for each of the Libraries covered during the course, and suggested readings will be share in the Moodle page

Teaching method

The curricular unit is based on a mix between theoretical and practical lessons with a strong, active learning component. During each session, students are exposed to new concepts and methodologies, case studies, and the resolution of examples. Active learning activities (debates, quizzes, mud cards, compare and contrast) will place students at the center of the classroom, promoting peer-teaching, and incite a positive discussion. Computer activities will take place weekly during the practical lessons. 

Evaluation Elements:

EE1 - Participation in classroom activities (60%)

EE2 - Practical Exam (40%).

Evaluation method

  1. Practical exam (40%): consists of an exercise that will need to be solved during the last class. Students will have two hours to develop in their computers the analysis of a data set provided by the instructors, and answer a few analytical questions;
  2. Final Project (60%): The final project consists of the elaboration of a report that details the process of acquisition, transformation, and analysis of a dataset. The project is to be developed in groups of at least three and up to four elements. More details about the project will be shared during the first couple of weeks in the Moodle page;

Subject matter

The curricular unit is organized in three Learning Units (LU):

LU0. Introduction to programming fundamentals using Python

LU1. Exploration of the most relevant libraries in the Python data science stack.

LU2. Use all the entire stack and its different parts to develop a data science project. 

Programs

Programs where the course is taught: