Stream Processing

Objectives

Learn the fundamental concepts, languages, and systems for building applications that process data streams. This course discusses, presents and discusses generalist systems for real-time stream processing, and will focus on the study of systems for structured data flow-oriented models.

To knowledge

A. Know the main programming models for streaming data processing

B. Know the languages and assimilate the fundamental characteristics to solve problems in the stream processing domain.

C. Understand the advantages and disadvantages of stream processing platforms.

Application

A. Being able to choose the most appropriate models, languages and tools to solve a stream processing problem.

B. Be capable of developing and executing stream processing applications using current tools and technologies.

General characterization

Code

11562

Credits

6.0

Responsible teacher

Jorg Matthias Knorr, Nuno Manuel Ribeiro Preguiça

Hours

Weekly - 4

Total - 50

Teaching language

Português

Prerequisites

Adequate programming skills in the Python programming language.

Bibliography

Opher Etzion and Peter Niblett. Event Processing in Action. Manning Publications, 2010.

Lukasz Golab and Tamer Özsu. Data Stream Management. Morgan and Claypool, 2010.

Bifet et al., (2018) Machine Learning for Data Streams, MIT Press

Several papers will be provided for further reading.

Teaching method

Lectures present the dominant models, languages and platforms for each thematic area of course syllabus, using paradigmatic examples to frame the discussion and the understanding of the issues.

Labs promote hands-on experience on the covered systems and languages, including application development for particular data streaming scenarios. Classes comprise demos, exercises and support for two programming projects to be developed during the semester. Examples of such application are the challenges of the main conference in this area (DEBS).

Grading comprises two quizzes (25% each) and two project assignments (25% each).

Evaluation method

Assessment

The assessment consists of a theoretical-practical component and a project component.

Both components are evaluated on a scale from 0 to 20, rounded to one decimal place.

The final grade for the course is the weighted average of the two components, with the project component contributing 40% and the theoretical-practical component contributing 60%.

To pass the course, a student needs to achieve a score equal to or higher than 9.5 points in both components.

Theoretical-practical component

This component is obtained through two individual tests.

Additionally, this component can also be through exam.

The grade for the theoretical component is obtained as:

  1. The arithmetic average of the grades from the 2 tests, rounded to one decimal place, or
  2. The grade from the make-up exam, rounded to one decimal place.

Project Component

This component consists of the completion of two assignments, done in groups of up to 2 students.

Despite the work being done in a group, the grade for this component will always be individual (and not assigned to the group).

The final grade for the component is the arithmetic average of the individual grades for the two projects, rounded to one decimal place.

The use of AI tools is permitted in the completion of the projects but must be clearly indicated. Excessive use may result in a penalty to the final grade.

To assess this issue or the individual contribution of each group member to the submitted work, groups may be called for a project discussion.

Subject matter

Distributed Stream Processing Systems.

System models for stream processing: streams as sequences of mini-batches (e.g. Spark streaming); continuous processing (e.g. Apache Flink, Storm).
Programming models. System aspects: distribution, scalability and fault-tolerance.
Distributed time-series databases. Systems for IoT stream processing.

Data Stream Management Systems (DSMS).

Structured Data Models for Streams. Algebraic operators on stream and relations.
Continuous query languages (extensions to SQL and database management systems to deal with data streams).

Complex Event Processing.
Streams as sequences of events. Production rules, reactive rules, and event-driven computing. Event processing networks, agents and channels. Complex and derived events. Detection of event patterns. Event-processing languages and systems.

Machine Learning for Streams.

Introduction to learning from data Dimensionality reduction for streams.
Learning under concept drift. Incremental learning.
Learning under imbalance and learning from graphs.