Data Science I

Objectives

The world most valuable resource is data, not oil. Hence, it is not surprising that organizations increasingly look to have in their ranks experts in the sciences of data who are able to identify the correct sources of data, raise relevant business-oriented questions, and extract useful knowledge and insights for the organization. In that sense, the Data Science for Hospitality & Tourism Course unit explores different techniques to perform a descriptive analysis of data using python, as well as an understanding of how data science benefits from Big Data and the different storing technologies.

Classes will involve a mix of lectures and practical exercises. Moreover, the course will have a strong active learning component, as such students are expected to actively participate in the class and read the recommended materials prior to each class. A short introduction to Python will be delivered in the first weeks of the course to enable students to explore and practice many of the theoretical concepts taught in the classes on their own.

Intended Learning Outcomes:

  • Understand the different nature of data and its sources
  • Explain why python is the preferred programming language for Data Scientists;
  • Perform the extraction, exploration, transformation, and analysis, of data using Python;
  • Perform descriptive analytics that helps you understand your data;
  • Report your analysis using meaningful visualizations and simple models;
  • Understand the role of Big Data Technologies in Data Science
  • Understand the different technologies and architectures for storing data for both Transactional and Analytical operations

General characterization

Code

400112

Credits

7.5

Responsible teacher

Flávio Luís Portas Pinheiro

Hours

Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Prerequisites

None

Bibliography

[A] VanderPlas, Jake. Python data science handbook: essential tools for working with data. " O'Reilly Media, Inc.", 2016.

[B] McKinney, Wes. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.", 2012.

[C] Grus, Joel. Data science from scratch: first principles with python. " O'Reilly Media, Inc.", 2015

[D] Additionally, students will be able to find a rich online documentation for each of the Libraries covered during the course, and suggested readings will be share in the Moodle page.

Teaching method

Theoretical and practical classes

Evaluation method

To successfully finish this course students need to score a minimum of combined 9.5 points from the following components:

1)Practical Exam (40%): consists of the analysis of a data set provided by the teaching staff, which should be completed within the two hours of one class;

2)Final Project (60%): The final project consists of the elaboration of a report that details the process of acquisition, transformation, and analysis of a dataset. The project is to be developed in groups of up to two elements. More details about the project will be shared during the first couple of weeks in the Moodle page;

Subject matter

Week

Instructor

Content

1

Flávio

  • Introduction
  • Course Overview
  • The importance of Data Science in organizations
  • Sources of Data and its nature
  • What is Python and Why are we learning it?
  • Setting up Python and Ipython (Anaconda and Jupyter Notebooks)
  • A simple python program ¿Hello World!¿

Chapters C1, A0 and D

2

Flávio

  • Introduction to Python and Numpy
  • Variables & Data Structures (Arrays, Lists, Dictionaries, Tuples
  • Control flow statements
  • Functions (Def and Lambda)
  • Python Standard Library

Chapters A1 and A2

3

Flávio

  • Introduction to Pandas,
  • Loading data with Pandas,
  • Manipulating information
  • Export data with Pandas

Chapters A3, B6, and B7

4

Flávio

  • Statistics and Probability
  • Describing a single set of data
  • Correlation
  • Simpson¿s Paradox
  • Correlation versus Causation
  • Normal Distribution and the Central Limit Theorem
  • Hypothesis Inference

Chapters C5, C6, and C7

5

Flávio

  • Sources of Data, getting data
  • Working with data
  • Data preparation
  • Exploring n-dimensional data
  • Cleaning, Rescaling, and Dimensionality reduction

Chapters B7, C9, and C10

6

Flávio

  • Multilinear Regressions
  • Clustering
  • K-means algorithm

Chapters C12, C15, and C19

7

Flávio

  • Social Network Analysis
  • What is a network and its characterization
  • Identify the central individuals in a social network
  • Recommender Systems

Chapters C21 and C22

8

Francisco

  • Data Storing with Relational Database Management Systems
  • Operational versus Analytical storing architectures
  • CRUD Operations

Chapter C23

9

Francisco

  • What is Big Data and its implications to Data Science
  • Understand the role of Big Data in data Analytics
  • Distributed systems and parallel programming

Chapter C24

10

Francisco

  • Business Intelligence
  • Reporting your results

Chapter B9 and C4

11

Francisco

  • Example of a Data Science project
  • Discussion of the benefits of Data Science for the support of decision making

12

Francisco

  • Final Project discussion
  • Brainstorming ideas and possible data sources
  • Formation of Groups

13

Francisco

  • Final Project support
  • Bring your questions and discuss with the intstructor

14

Francisco

  • Project Report Delivery
  • Oral Presentations