Analyzing Big Data
Objetivos
The Big Data landscape is continuously evolving as new technologies emerge and existing technologies mature. This is a comprehensive course covering Hadoop and key elements of the the Spark libraries used in developing applications for processing Big Data efficiently.
Students who complete this course will understand key Spark concepts and they will learn to use Spark for Big Data processing and analysis to solve the types of problems faced by enterprises and research institutions today.
Caracterização geral
Código
400083
Créditos
7.5
Professor responsável
Rui Manuel Simões Rosa
Horas
Semanais - A disponibilizar brevemente
Totais - A disponibilizar brevemente
Idioma de ensino
Português. No caso de existirem alunos de Erasmus, as aulas serão leccionadas em Inglês
Pré-requisitos
Students should have basic programming experience in python.
Basic familiarity with the Linux command line is helpful.
Basic knowledge of SQL is helpful as well as data modeling concepts.
Prior knowledge of Hadoop is not required.
Bibliografia
- Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. Learning Spark: Lightning-fast Data Analysis. O'Reilly, 2015.
- Bill Chambers, Matei Zaharia. Spark: The Definitive Guide: Big Data Processing Made Simple. O'Reilly, 2018.
- White, Tom E. Hadoop: The Definitive Guide (4Th Edition). 1st ed. O'Reilly, 2015.
Método de ensino
Aulas semanis de conteúdo teorico e exercícios práticos.
Método de avaliação
The evaluation considers the following components:
- A Final Exam (50%): Consists in a mix of Multiple-Choice and Open questions test covering all the material of the course.
- A Group Project (50%): Consists of on the implementation of a Big Data use case using Hadoop and Spark.
A minimum of 9.5 (out of 20) points are necessary to successfully pass the course.
Conteúdo
1. Introduction to Big Data Processing
2. Working with Databricks Community Edition
3. Introduction to Hadoop - Data Loading and Processing
4. Spark Basics
5. Working with RDDs
6. Spark Programming
7. Spark SQL and Data Frames
8. Spark Streaming
9. Spark Graph Analysis
10. Spark Algorithms
11. Spark ML Intro
12. Data Science Examples with Spark
Cursos
Cursos onde a unidade curricular é leccionada: