Big Data Applications
Objectives
The recent explosion of data resulted in creation of large volumes of mostly unstructured data: web logs, videos, speech recordings, photographs, e-mails, Tweets, and similar.
Also storage, retrieval, analysis, and knowledge discovery using Big Data has made significant inroads in several domains in industry, research, and academia.
In this course, we will look at the dominant software systems and algorithms for coping with Big Data. This course will allow students to combine principles from multiple domains to analyze large volumes of unstructured datasets.
The course will involve hands-on programming assignments using real-world datasets and services on Azure cluster made possible by Microsoft Educator Grant Program.
General characterization
Code
200145
Credits
7.5
Responsible teacher
Hours
Weekly - Available soon
Total - Available soon
Teaching language
Portuguese. If there are Erasmus students, classes will be taught in English
Prerequisites
None
Bibliography
Hadoop: The Definitive Guide. Tom White. O'Reilly 2014; Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset. Michael Frampton; 0; 0; 0
Teaching method
The course is mainly based on lecture and practical classes. The practical sessions include exposure of concepts and methodologies, sample resolution, discussion and interpretation of results. Practical work, which is very significant in this course is done by students outside the classroom, but is evaluated
Evaluation method
1st term and 2nd term
• elective group project (40%)
• exam (60%)
Subject matter
The course topics are:
• Course overview. Introduction to Big Data
• "Introduction to Hadoop
o Understanding the Hadoop Architecture
o Setting Up A Pseudo-Distributed Environment
o The Distributed File System (HDFS)"
• Understanding MapReduce.
o Understanding MapReduce 2.0 YARN
• MapReduce development
• Introduction to Hive
o Interacting with Data via the Hive Console
o Creating Databases, Tables, and Schemas for Hive
o Loading Data into Hive from HDFS
o Querying Data and Performing Aggregations With Hive"
• MapReduce development: advanced
• Understanding PIG
• Using Hadoop in the Cloud: HDInsight
• Big data analytics demos