Big Data Applications

Objetivos

Basic programming experience in python, as well as basic familiarity with the Linux command line is preferable. Basic knowledge of SQL is helpful; prior knowledge of Hadoop is not required.

Caracterização geral

Código

200145

Créditos

7.5

Professor responsável

Horas

Semanais - A disponibilizar brevemente

Totais - A disponibilizar brevemente

Idioma de ensino

Português. No caso de existirem alunos de Erasmus, as aulas serão leccionadas em Inglês

Pré-requisitos

CUC1.Introduction to Hadoop

  • Introduction to Hadoop and the Hadoop Ecosystem
  • Hadoop Architecture and HDFS

CUC2.Importing and Modeling Structured Data

  • Importing Relational Data with Apache Sqoop
  • Introduction to Impala and Hive
  • Modeling and Managing Data with Impala and Hive
  • Data Formats
  • Data File Partitioning

CUC3.Ingesting Streaming Data

  • Capturing Data with Apache Flume

CUC4.Distributed Data Processing with Spark

  • Spark Basics
  • Working with RDDs in Spark
  • Aggregating Data with Pair RDDs
  • Writing and Deploying Spark Applications
  • Parallel Processing in Spark
  • Spark RDD Persistence
  • Common Patterns in Spark Data Processing
  • Spark SQL and DataFrames

Bibliografia

Hadoop: The Definitive Guide. Tom White. O'Reilly 2014; Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset. Michael Frampton; 0; 0; 0

Método de ensino

1st term and 2nd term
 - elective group project (40%)
 -exam (60%)

Método de avaliação

N/A

Conteúdo

O curso é baseado principalmente em aulas teóricas e práticas. As sessões práticas incluem exposição de conceitos e metodologias, resolução de amostras, discussão e interpretação de resultados.