Continuous, Adaptive, Data-driven systems

Objectives

Increasingly, both our lives of those around us are spent online. This is more true than ever, given to current health crisis. We keep up with friends and family via social networks, we check the news on our phones, we find information on Wikipedia and consult with “Dr. Google” when feeling sick. All of these leave traces that we can use to understand ourselves and others better, improving decision making and different systems in a data-driven way.
In this course we will explore new datasets and discuss how to go from data to models to decisions in this context. In particular, we will take advantage of the current COVID-19 pandemic and use it a case-study to work with real data on a real problem. Who shares fake news? Can we predict epidemics from online searches? Which product is going to sell?
This course aims at creating a broader understanding of the power of analysing (mostly) online data and the full pipeline from data extraction to decisions, and back again, in a continuous, feed- back dependent system. It will cover basics in webpage scraping, data curation, natural language processing, sentiment analysis, and implementation.

General characterization

Code

2495

Credits

3.5

Responsible teacher

Joana Gonçalves de Sá

Hours

Weekly - Available soon

Total - Available soon

Teaching language

English

Prerequisites

Available soon

Bibliography

This is an experimental course, at the intersection of theory and practice, and there will be no required textbook. State of the art papers and educational materials (articles, online resources) will be offered to prepare each class. General references in Data Mining, Data Analytics and Data systems can be suggested.

Teaching method

This course will be very fast-paced, hands-on, using a learn-by-doing framework. This is an experimental setup, given the current circumstances, and strong motivation is required.
We will use a combined approach of online lectures and flip-the classroom practicals.
Students will self-organize in groups and choose from a list of problems (ex: real time identification of new COVID-19 cases, Media and social media coverage of the pandemic, economic impact(s)). Each group will be provided with educational materials, datasets and online instructions. Every week, there will be a 1h to 2h common lecture, followed by hands-on technical support, in a group-instructor format. During the lecture, we will present current theory and knowledge; during the practicals, students will try different hands-on tools to tackle “their” problem. At the end of each class, students will be faced with a new challenge, to be prepared during the week and discussed in the following lecture. This offers the course a very stimulating and fast paced environment, permanently linking theory to applications and back again, on a very timely topic.

Evaluation method

No compulsory attendance to lectures, weekly instructor-group meeting are required (30min each)

Students are expected to tackle real challenges (4 in total) and present a final report. Creativity is encouraged, but proper code annotation and thoughtful approaches are fundamental. Students are expected to work together and discuss the problems with each other, but the write-ups must be individual. The final assessment will have two steps: 1) the write-up; 2) an oral defence of that report.

The final grade in the CADD course considers:
-    Individual and Group Work (40%)
-    Final Report (40%)
-    Oral Discussion (20%)

Subject matter

This course is incremental with ethics being discussed throughout. It will be very hands-on and cover the basics of data systems analytics.
During the term we will use real data from the current pandemic, focus on different aspects and discuss tools to tackle them.