Analyzing Big Data

Objetivos

The Big Data landscape is continuously evolving as new technologies emerge and existing technologies mature. This is a comprehensive course covering Hadoop and key elements of the the Spark libraries used in developing applications for processing Big Data efficiently.

 

Students who complete this course will understand key Spark concepts and they will learn to use Spark for Big Data processing and analysis to solve the types of problems faced by enterprises and research institutions today.

Caracterização geral

Código

400083

Créditos

7.5

Professor responsável

Rui Manuel Simões Rosa

Horas

Semanais - A disponibilizar brevemente

Totais - A disponibilizar brevemente

Idioma de ensino

Português. No caso de existirem alunos de Erasmus, as aulas serão leccionadas em Inglês

Pré-requisitos

Students should have basic programming experience in python.

Basic familiarity with the Linux command line is helpful.

Basic knowledge of SQL is helpful as well as data modeling concepts.

Prior knowledge of Hadoop is not required.

Bibliografia

  • Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. Learning Spark: Lightning-fast Data Analysis. O'Reilly, 2015.
  • Bill Chambers, Matei Zaharia. Spark: The Definitive Guide: Big Data Processing Made Simple. O'Reilly, 2018.
  • White, Tom E. Hadoop: The Definitive Guide (4Th Edition). 1st ed. O'Reilly, 2015.

Método de ensino

Aulas semanis de conteúdo teorico e exercícios práticos.

Método de avaliação

The evaluation considers the following components:

  • A Final Exam (50%): Consists in a mix of Multiple-Choice and Open questions test covering all the material of the course.
  • A Group Project (50%): Consists of on the implementation of a Big Data use case using Hadoop and Spark.

A minimum of 9.5 (out of 20) points are necessary to successfully pass the course.

Conteúdo

1. Introduction to Big Data Processing

2. Working with Databricks Community Edition

3. Introduction to Hadoop - Data Loading and Processing 

4. Spark Basics

5. Working with RDDs

6. Spark Programming

7. Spark SQL and Data Frames

8. Spark Streaming

9. Spark Graph Analysis

10. Spark Algorithms

11. Spark ML Intro

12. Data Science Examples with Spark

Cursos

Cursos onde a unidade curricular é leccionada: