Analyzing Big Data
The Big Data landscape is continuously evolving as new technologies emerge and existing technologies mature. This is a comprehensive course covering Apache Spark and key elements of the Hadoop Ecosystem used in developing applications for processing Big Data.
Students who complete this course will understand key Spark concepts and they will learn to use Spark for Big Data processing and analysis to solve the types of problems faced by enterprises and research institutions today.
Rui Manuel Simões Rosa
Weekly - Available soon
Total - Available soon
Portuguese. If there are Erasmus students, classes will be taught in English
Students should have basic programming experience in python.
Basic familiarity with the Linux command line is helpful.
Basic knowledge of SQL is helpful as well as data modeling concepts.
Prior knowledge of Hadoop is not required.
- Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. Learning Spark: Lightning-fast Data Analysis. O'Reilly, 2015.
- Bill Chambers, Matei Zaharia. Spark: The Definitive Guide: Big Data Processing Made Simple. O'Reilly, 2018.
- White, Tom E. Hadoop: The Definitive Guide (4Th Edition). 1st ed. O'Reilly, 2015.
Weekly classes with theorical and practical content.
The evaluation considers the following components:
- A Final Exam (60%): Consists in a mix of Multiple-Choice and Open questions test covering all the material of the course.
- A Group Project (40%): Consists of on the implementation of a Big Data use case using Hadoop and Spark.
A minimum of 9.5 (out of 20) points are necessary to successfully pass the course.
1. Introduction to the course and Big Data Processing concepts
2. Working with Databricks Community Edition
3. Hadoop and Spark - Data Loading and processing
4. Spark Basics and RDDs
5. Spark Programming concepts
6. Spark Algorithms
7. Spark SQL and Data Frames
8. Spark Streaming
9. Spark Graph Analysis
10. Spark ML introduction
11. Data Science examples with Spark
12. Spark programming examples and exercises
Programs where the course is taught: