Big Data for Marketing

Objectives

Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and obtain insights from large datasets. In this course we will discuss the challenges created by Big Data and the state-of-the-art approaches do deal with them with a focus on marketing applications.

During Lectures we will overview the complex and heterogeneous Big Data ecosystem, and the privacy and societal implications of brought by these technologies. A particular emphasis will be put into understanding the components that make up the popular Hadoop ecosystem (Hadoop, Hive, Kafka, Sqoop, and Spark) as well as the latest approaches to storing big data (NoSQL databases). During the lab’s students will obtain hands on experience with Spark in the Databricks notebook environment.

General characterization

Code

200202

Credits

7.5

Responsible teacher

Dhruv Akshay Pandit

Hours

Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Prerequisites

It is strongly recommended that students have familiarity with Python programming language. It is recommended students are familiar with SQL although some background will be provided in the course.

Classes will be delivered in English. As such students are expected to have a good level of comprehension and communication in English.

Bibliography

  1. Karau, Holden, et al. Learning spark second edition. " O'Reilly Media, Inc.", 2020
  2. Sullivan, Dan. NoSQL for Mere Mortals. Addison-Wesley Professional, 2015.
  3. White, Tom. Hadoop: The definitive guide. " O'Reilly Media, Inc.", 2012;
  4. Additionally, selected book chapters and articles will be shared in the Moodle Page of the course

 

Teaching method

The curricular unit is based on a mix between theoretical and practical lessons with a strong, active learning component. During each session, students are exposed to new concepts and methodologies, case studies, and the resolution of examples.

Evaluation Elements:

EE1 – Practical assesment (50%)

EE2 - Exam (50%).

Evaluation method

To successfully finish this curricular unit, students need to score a minimum of 9.5 points on an exam and 9.5 points overall. The grading is divided into two seasons. Attendance in the second is optional for students that passed the curricular unit in the first season and can be used to improve their grade. Please read carefully the details of the second session below.

Continuous Evaluation (1st Call)

The first season is dedicated to continuous evaluation and replaces the first Exam call. The continuous evaluation includes the following two assessment components:

  1. Exam (50%) – This is an individual assessment activity. A 40 multiple-choice questions exam with the duration of 80 minutes. Exam will be in person and answered on paper. The exam focuses on the theoretical elements of the course (90%+) with a couple of basic questions on pyspark code. A score of 9.5 must be achieved in the exam. NOTE - A handwritten, double sided, A4 cheat sheet of notes is allowed in exam.
  2. Project (50%) – In groups of 4-5 students, groups must complete a project with spark, write a report and deliver a short video-presentation. In addition interviews may be conducted with students where the work is unclear. The following must be delivered:
    The report (including references, figures, and tables) with Font-size 11pt, 1.5 space between lines, and using Times New Roman as the IGNORE. In the case of the databricks project this report should be 5-10 pages. The report should focus on explaining the problem and the students understanding of approaches to solve the problem as well as presenting the results from the work and their implications. The report must include a title, name and number of each group member;
    Each group needs to provide all materials, including data and notebooks, to recreate all analysis conducted and demonstrate the work of the project.
    Each group needs to upload a 10-minute video presentation, the video should focus on the practical work done by the students, explaining the code in the notebook, why it was chosen, how it works, and all investigation done during the work;
    Interviews may be conducted where any of the work is unclear.

Final Exam (2nd Call) (100% of grade)

The second grading season will take place in July and consists of a multiple-choice exam. The exam result replaces all previous assesment and counts for 100% of the grade, a score of 9.5 must be acheived to pass. The Exam is made up of 40 multiple-choice questions. Please note: As this exam covers the full course (practical and theoretical components) it is typically significantly more difficult than the first exam that has a larger focus on theoretical components. Student averages are significantly lower for this exam typically.

Subject matter

The curricular unit is organized in four Learning Units (LU):

LU0. Introduction to Big Data

LU1. Storing big Data with NoSQL.

LU2. Data Analytics with Big Data.

LU3. Data Ingestion, and architectural concerns.

LU4. Marketing applications of Big Data.