Knowledge Discovery

Objectives

Objectives:

Understand and develop data processing and analysis processes.

Learn and understand the functioning of data analysis algorithms used in previously undiscovered data classification tasks, supervised and unsupervised learning, association rules (“if you like X you will probably be interested in Y, Z and W”).

Big Data data processing and analysis concepts. Tools that allow you to deal with Big Data in the Cloud (based on Google technologies). Process terabytes in seconds and petabytes in minutes.

Video: Machine Learning: Making Sense of a Messy World

Tools for Artificial Intelligence.

Ethics in data processing in the form of debates between groups of students

Also develop Non-Technical Capabilities:

Oral and written communication skills

Conducting a demonstration

Report on the analysis, design and implementation of a solution

Work organization, time management and meeting deadlines

Team work and collaboration

Research capacity and autonomy

General characterization

Code

12793

Credits

6.0

Responsible teacher

João Paulo Branquinho Pimentão, Pedro Alexandre da Costa Sousa

Hours

Weekly - 4

Total - 56

Teaching language

Português

Prerequisites

Non-existent

Bibliography

Data Mining de Eibe Frank, Christopher Pal, Mark Hall e Ian H. Witten, ISBN: 9780128042915, ELSEVIER SCIENCE & TECHNOLOGY

Handbook of Data Mining and Knowledge Discovery 1st Edition, by Jan Zyt (Author), Willi Klosgen (Editor), the late Jan M. Zytkow (Editor), ISBN-13: 978-0195118315

Big Data Analytics: Systems, Algorithms, Applications 1st ed. 2019 Edition, by C.S.R. Prabhu (Author), at all. , ISBN-13: 978-9811500930

Information theory, inference, and learning algorithms - Mackay, David, Cambridge University Press, ISBN: 978-0521642989

Principles of data mining - Hand, David; Smyth, Padhrai; Mannila, Heikki, MIT Press, ISBN: 978-0262082907

Pattern recognition and machine learning - Bishop, Christopher M., Springer, ISBN: 978-0387310732

Visualize This: The Flowing Data Guide to Design, Visualization, and Statistic - Yau, Nathan, John Wiley & Sons. ISBN: 978-0470944882

Teaching method

The course is divided into theoretical-practical and practical classes.
In theoretical-practical subjects, more theoretical themes are covered that introduce problems that students will have to solve and that are the basis for the work that they will implement in their respective practice.
In practical classes the execution of problems (implementation) is carried out.
The work that students develop in practice must be delivered within defined deadlines, together with an analysis, design and implementation report.

There is also a set of debates on ethical topics related to data analysis in which students participate in groups, with their performance evaluated by teachers and peers.

Evaluation method

Theoretical-Practical component (weight of 34%) - NTP:
=========================================================
It can be carried out through 1 test or exam;
It is necessary to have a grade (exam or test) of no less than 9.5.

Practical component (66% weight) - NP:
========================================
1st work (T1): 30%, Final work (T2) 50%, Qwiklabs: 10%, Debate: 10%
Work: Delivery via Moodle. Assessment based on implemented features.

It is necessary to have an average grade of no less than 9.5.

NOTE: approvals from the previous academic year can be used this semester.

Final Grade Calculation - NF:
====================
NF = 34%*NTP + 66%*NP

Subject matter

Introduction

  • Intelligent systems

  • Data «warehouse»

  • Knowledge discovery

 

Managing Knowledge discovery projects

  • CRISP-DM, SEMMA

 

Data Warehouse and OLAP 

  • Data Warehouse and DBMS 

  • Multidimensional data model 

  • OLAP

 

Data preprocessing 

  • Data cleaning 

  • Data transformation 

  • Data reduction 

  • Concept hierarchies

  • Data Quality

 

Data mining knowledge representation 

  • Interestingness measures 

  • Input data

  • Models 

  • Visualization techniques 

 

Learning

  • Classification/regression

  • Segmentation

  • Instance-based methods (nearest neighbor) 

  • Association

  • Clustering

 

Evaluating what''''''''s been learned 

  • Training and testing 

  • Estimating classifier accuracy (holdout, cross-validation, leave-one-out) 

  • Combining multiple models 

 

Mining real data 

 

Dealing with Big Data

  • What is makes Data, Big Data

  • Scalable Data Analytics Framework

  • Large-scale Data Analysis Models

  • Distributed Storage Architecture

  • NoSQL Databases

  • Data Flow Management

Ethics and privacy