Analyzing Big Data
Objectives
The Big Data landscape is continuously evolving as new technologies emerge and existing technologies mature. This is a comprehensive course covering Hadoop and key elements of the the Spark libraries used in developing applications for processing Big Data efficiently.
Students who complete this course will understand key Spark concepts and they will learn to use Spark for Big Data processing and analysis to solve the types of problems faced by enterprises and research institutions today.
General characterization
Code
400083
Credits
7.5
Responsible teacher
Rui Manuel Simões Rosa
Hours
Weekly - Available soon
Total - Available soon
Teaching language
Portuguese. If there are Erasmus students, classes will be taught in English
Prerequisites
Students should have basic programming experience in python.
Basic familiarity with the Linux command line is helpful.
Basic knowledge of SQL is helpful as well as data modeling concepts.
Prior knowledge of Hadoop is not required.
Bibliography
- Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. Learning Spark: Lightning-fast Data Analysis. O'Reilly, 2015.
- Bill Chambers, Matei Zaharia. Spark: The Definitive Guide: Big Data Processing Made Simple. O'Reilly, 2018.
- White, Tom E. Hadoop: The Definitive Guide (4Th Edition). 1st ed. O'Reilly, 2015.
Teaching method
Weekly classes with theorical and practical content.
Evaluation method
The evaluation considers the following components:
- A Final Exam (50%): Consists in a mix of Multiple-Choice and Open questions test covering all the material of the course.
- A Group Project (50%): Consists of on the implementation of a Big Data use case using Hadoop and Spark.
A minimum of 9.5 (out of 20) points are necessary to successfully pass the course.
Subject matter
1. Introduction to Big Data Processing
2. Working with Databricks Community Edition
3. Introduction to Hadoop - Data Loading and Processing
4. Spark Basics
5. Working with RDDs
6. Spark Programming
7. Spark SQL and Data Frames
8. Spark Streaming
9. Spark Graph Analysis
10. Spark Algorithms
11. Spark ML Intro
12. Data Science Examples with Spark