Data Science I

Objectives

None

General characterization

Code

400112

Credits

7.5

Responsible teacher

Docente a designar

Hours

Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Prerequisites

Week

Instructor

Content

1

Flávio

  • Introduction
  • Course Overview
  • The importance of Data Science in organizations
  • Sources of Data and its nature
  • What is Python and Why are we learning it?
  • Setting up Python and Ipython (Anaconda and Jupyter Notebooks)
  • A simple python program ¿Hello World!¿

Chapters C1, A0 and D

2

Flávio

  • Introduction to Python and Numpy
  • Variables & Data Structures (Arrays, Lists, Dictionaries, Tuples
  • Control flow statements
  • Functions (Def and Lambda)
  • Python Standard Library

Chapters A1 and A2

3

Flávio

  • Introduction to Pandas,
  • Loading data with Pandas,
  • Manipulating information
  • Export data with Pandas

Chapters A3, B6, and B7

4

Flávio

  • Statistics and Probability
  • Describing a single set of data
  • Correlation
  • Simpson¿s Paradox
  • Correlation versus Causation
  • Normal Distribution and the Central Limit Theorem
  • Hypothesis Inference

Chapters C5, C6, and C7

5

Flávio

  • Sources of Data, getting data
  • Working with data
  • Data preparation
  • Exploring n-dimensional data
  • Cleaning, Rescaling, and Dimensionality reduction

Chapters B7, C9, and C10

6

Flávio

  • Multilinear Regressions
  • Clustering
  • K-means algorithm

Chapters C12, C15, and C19

7

Flávio

  • Social Network Analysis
  • What is a network and its characterization
  • Identify the central individuals in a social network
  • Recommender Systems

Chapters C21 and C22

8

Francisco

  • Data Storing with Relational Database Management Systems
  • Operational versus Analytical storing architectures
  • CRUD Operations

Chapter C23

9

Francisco

  • What is Big Data and its implications to Data Science
  • Understand the role of Big Data in data Analytics
  • Distributed systems and parallel programming

Chapter C24

10

Francisco

  • Business Intelligence
  • Reporting your results

Chapter B9 and C4

11

Francisco

  • Example of a Data Science project
  • Discussion of the benefits of Data Science for the support of decision making

12

Francisco

  • Final Project discussion
  • Brainstorming ideas and possible data sources
  • Formation of Groups

13

Francisco

  • Final Project support
  • Bring your questions and discuss with the intstructor

14

Francisco

  • Project Report Delivery
  • Oral Presentations

Bibliography

[A] VanderPlas, Jake. Python data science handbook: essential tools for working with data. " O'Reilly Media, Inc.", 2016.

[B] McKinney, Wes. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.", 2012.

[C] Grus, Joel. Data science from scratch: first principles with python. " O'Reilly Media, Inc.", 2015

[D] Additionally, students will be able to find a rich online documentation for each of the Libraries covered during the course, and suggested readings will be share in the Moodle page.

Teaching method

To successfully finish this course students need to score a minimum of combined 9.5 points from the following components:

1)Practical Exam (40%): consists of the analysis of a data set provided by the teaching staff, which should be completed within the two hours of one class;

2)Final Project (60%): The final project consists of the elaboration of a report that details the process of acquisition, transformation, and analysis of a dataset. The project is to be developed in groups of up to two elements. More details about the project will be shared during the first couple of weeks in the Moodle page;

Evaluation method

English

Subject matter

Theoretical and practical classes