Big Data Applications

Objectives

 The recent explosion of data resulted in creation of large volumes of mostly unstructured data: web logs, videos, speech recordings, photographs, e-mails, Tweets, and similar.
Also storage, retrieval, analysis, and knowledge discovery using Big Data has made significant inroads in several domains in industry, research, and academia.
In this course, we will look at the dominant software systems and algorithms for coping with Big Data. This course will allow students to combine principles from multiple domains to analyze large volumes of unstructured datasets.
The course will involve hands-on programming assignments using real-world datasets and services on Azure cluster made possible by Microsoft Educator Grant Program.

General characterization

Code

200145

Credits

7.5

Responsible teacher

Hours

Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Prerequisites

None

Bibliography

Hadoop: The Definitive Guide. Tom White. O'Reilly 2014; Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset. Michael Frampton; 0; 0; 0

Teaching method

The course is mainly based on lecture and practical classes. The practical sessions include exposure of concepts and methodologies, sample resolution, discussion and interpretation of results. Practical work, which is very significant in this course is done by students outside the classroom, but is evaluated

Evaluation method

1st term and 2nd term
• elective group project (40%)
• exam (60%)

Subject matter

The course topics are:
• Course overview. Introduction to Big Data
• "Introduction to Hadoop
o Understanding the Hadoop Architecture
o Setting Up A Pseudo-Distributed Environment
o The Distributed File System (HDFS)"
• Understanding MapReduce.
o Understanding MapReduce 2.0 YARN
• MapReduce development
• Introduction to Hive
o Interacting with Data via the Hive Console
o Creating Databases, Tables, and Schemas for Hive
o Loading Data into Hive from HDFS
o Querying Data and Performing Aggregations With Hive"
• MapReduce development: advanced
• Understanding PIG
• Using Hadoop in the Cloud: HDInsight
• Big data analytics demos