Analyzing Big Data

Objectives

The Big Data landscape is continuously evolving as new technologies emerge and existing technologies mature. This is a comprehensive course covering Apache Spark and key elements of the Hadoop Ecosystem used in developing applications for processing Big Data.

 

Students who complete this course will understand key Spark concepts and they will learn to use Spark for Big Data processing and analysis to solve the types of problems faced by enterprises and research institutions today.

General characterization

Code

400083

Credits

7.5

Responsible teacher

Rui Manuel Simões Rosa

Hours

Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Prerequisites

Students should have basic programming experience in python.

Basic familiarity with the Linux command line is helpful.

Basic knowledge of SQL is helpful as well as data modeling concepts.

Prior knowledge of Hadoop is not required.

Bibliography

  • Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. Learning Spark: Lightning-fast Data Analysis. O'Reilly, 2015.
  • Bill Chambers, Matei Zaharia. Spark: The Definitive Guide: Big Data Processing Made Simple. O'Reilly, 2018.
  • White, Tom E. Hadoop: The Definitive Guide (4Th Edition). 1st ed. O'Reilly, 2015.

Teaching method

Weekly classes with theorical and practical content.

Evaluation method

The evaluation considers the following components:

  • A Final Exam (60%): Consists in a mix of Multiple-Choice and Open questions test covering all the material of the course.
  • A Group Project (40%): Consists of on the implementation of a Big Data use case using Hadoop and Spark.

A minimum of 9.5 (out of 20) points are necessary to successfully pass the course.

Subject matter

1. Introduction to the course and Big Data Processing concepts

2. Working with Databricks Community Edition

3. Hadoop and Spark - Data Loading and processing 

4. Spark Basics and RDDs

5. Spark Programming concepts

6. Spark Algorithms

7. Spark SQL and Data Frames

8. Spark Streaming

9. Spark Graph Analysis

10. Spark ML introduction

11. Data Science examples with Spark

12. Spark programming examples and exercises 

Programs

Programs where the course is taught: