Programming for Data Science

Objectivos

The Programming for Data Science course unit aims at introducing the basics of programming in Python for Data Scientists. The course is oriented to students that do not have any experience in computer programming, starting from the very basics of computation. However, the course will rapidly evolve towards advanced programming techniques and concepts. In this way, at the end of this course unit, the students will be able to effectively approach complex problems, typically characterized by vast amounts of data, programming efficient strategies to extract information and support decision-making processes.

Classes will involve a mix of lectures and practical exercises. Moreover, the course will have a strong active learning component, as such students are expected to actively participate in the class and read the recommended materials prior to each class. A short introduction to Python will be delivered in the first weeks of the course to enable students to explore and practice many of the theoretical concepts taught in the classes.

Intended Learning Outcomes

  • Explain why python is the preferred programming language for Data Scientists;
  • Understand the basics of python programming language;
  • Use the most adequate libraries for your needs;
  • Perform the extraction, manipulation, analysis, modeling, and reporting of data using Python
  • Feel Comfortable using Python as a tool for your data science projects!

Caracterização geral

Código

400090

Créditos

3.5

Professor responsável

Flávio Luís Portas Pinheiro

Horas

Semanais - A disponibilizar brevemente

Totais - A disponibilizar brevemente

Idioma de ensino

Português. No caso de existirem alunos de Erasmus, as aulas serão leccionadas em Inglês

Pré-requisitos

None

Bibliografia

Lubanovic, Bill. Introducing Python: modern computing in simple packages. " O'Reilly Media, Inc.", 2014;

VanderPlas, Jake. Python data science handbook: essential tools for working with data. " O'Reilly Media, Inc.", 2016.

McKinney, Wes. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.", 2012.

Grus, Joel. Data science from scratch: first principles with python. " O'Reilly Media, Inc.", 2015

Additionally, students will be able to find a rich online documentation for each of the Libraries covered during the course, and suggested readings will be share in the Moodle page

Método de ensino

Theoretical and practical classes

Método de avaliação

  1. Practical exam (40%): consists of an exercise that will need to be solved during the last class. Students will have two hours to develop in their computers the analysis of a data set provided by the instructors, and answer a few analytical questions;
  2. Final Project (60%): The final project consists of the elaboration of a report that details the process of acquisition, transformation, and analysis of a dataset. The project is to be developed in groups of at least three and up to four elements. More details about the project will be shared during the first couple of weeks in the Moodle page;

Conteúdo

Week

Instructor

Content

1

February 12th

FLP

  • Course Overview
  • What is Python and Why are we learning it?
  • Setting up Python (Anaconda and Jupyter Notebooks)
  • ¿Hello World¿, your first program in Python
  • Variables & Data Structures (Arrays, Lists, Dictionaries, Tuples)

Chapters B1, A0 and C

2

February 19th

FLP

  • Reading and Writing to Files (I/O operations)
  • Flow Control in Python
  • Loops (For, While, iterators)
  • If Statements

Chapters C2 and C6

3

February 26th

FLP

  • Functions (Def and Lambda);
  • Python Standard Library;
  • Implement Recursivity with functions

Chapters A5 and C

4

March 12th

FLP

  • Object-Oriented Programming;
  • Objects and Classes
  • Modules, Lists, and Dictionaries

Chapters A6

5

March 19th

FLP

  • Import Libraries
  • Introduction to Numpy
  • Numpy Data Types
  • Basics of Numpy Arrays
  • Aggregations and Sorting Arrays
  • Introduction to Scipy

Chapters B2, C4, and C14a

6

March 26th

FLP

  • Introduction to Pandas
  • Pandas data structures: Series and DataFrames
  • Data exploration with Pandas
  • Reading Data

Chapters  B3, C5, and C6

7

April 2nd

FLP

  • Pandas Advanced Concepts
  • Analyze data with Pandas

Chapters C12 and C10

8

April 9th

FLP

  • Introduction to Statsmodel
  • Some notes on Statistics
  • Perform simple statistical analysis in Python;

Chapters C13

9

April 23rd

JA

  • Introduction to Matplotlib and Seaborn;
  • Use Visualization to drive your data exploration.

Chapter B9 and C4

10

April 30th

JA

  • Reporting your Findings

Chapter B9 and C4

11

May 7th

JA

  • Case Studies
  • Example of a full stack project using python
  • Worked out exercise

Chapters C14

12

May 14th

JA

  • Final Project support

13

May 21st

JA

  • Project Oral Presentations

14

May 28th

JA

  • Practical Exam