Knowledge Discovery

Objectives

At the end of this course unit the student is expected to have acquired knowledge, skills and competences that will allow him / her to:

Knowledge:
Identification of Knowledge Discovery problems
Datamining
ETL
Datawarehouse

Do:
Algorithm specification.
Specification, Development and implementation of datamining

Non-Technical:
Written and oral communication skills
Demonstration skills
Produce reports of analysis, design and implementation of a solution
Work management, time management and delivery deadlines
Teamwork and tem participation
Delivery
Honesty

General characterization

Code

12793

Credits

6.0

Responsible teacher

João Paulo Branquinho Pimentão, Pedro Alexandre da Costa Sousa

Hours

Weekly - 4

Total - 56

Teaching language

Português

Prerequisites

Not Defined.

Bibliography

Data Mining de Eibe Frank, Christopher Pal, Mark Hall e Ian H. Witten, ISBN: 9780128042915, ELSEVIER SCIENCE & TECHNOLOGY

Handbook of Data Mining and Knowledge Discovery 1st Edition, by Jan Zyt (Author), Willi Klosgen (Editor), the late Jan M. Zytkow (Editor), ISBN-13: 978-0195118315

Big Data Analytics: Systems, Algorithms, Applications 1st ed. 2019 Edition, by C.S.R. Prabhu (Author), at all. , ISBN-13: 978-9811500930

Information theory, inference, and learning algorithms - Mackay, David, Cambridge University Press, ISBN: 978-0521642989

Principles of data mining - Hand, David; Smyth, Padhrai; Mannila, Heikki, MIT Press, ISBN: 978-0262082907

Pattern recognition and machine learning - Bishop, Christopher M., Springer, ISBN: 978-0387310732

Visualize This: The Flowing Data Guide to Design, Visualization, and Statistic - Yau, Nathan, John Wiley & Sons. ISBN: 978-0470944882

Teaching method

The course is divided into theoretical-practical classes and practical classes.
In the theoretical-practical classes the subjects are introduced and practical problems are formulated that the students have to solve in the respective pratical classes.
In the practical classes the execution of the problems (implementation) is carried out.
All the work that the students develop in practice is part of a larger work (integration) that students have to deliver in a defined time frame, together with a report of analysis, design and implementation.

Evaluation method

Theoretical-Practical component (weight of 34%) - NTP:
===========================================
Can be performed through 1 test or exam;
It is necessary to have a grade (of exam or average of the tests) of not less than 9.5 values.

Practical component (weight of 66%) - NP:
=========================================
2 projects. Delivery through Moodle. Evaluation based on implemented functionalities.
It is necessary to have average grade of not less than 9.5 values.

NOTE: Approvals from the previous year can be used this semester.

Calculation of final grade - NF:
====================
NF = 34% * NT + 66% * NTP

Subject matter

Introduction

  • Intelligent systems

  • Data «warehouse»

  • Knowledge discovery

 

Managing Knowledge discovery projects

  • CRISP-DM, SEMMA

 

Data Warehouse and OLAP 

  • Data Warehouse and DBMS 

  • Multidimensional data model 

  • OLAP

 

Data preprocessing 

  • Data cleaning 

  • Data transformation 

  • Data reduction 

  • Concept hierarchies

  • Data Quality

 

Data mining knowledge representation 

  • Interestingness measures 

  • Input data

  • Models 

  • Visualization techniques 

 

Learning

  • Classification/regression

  • Segmentation

  • Instance-based methods (nearest neighbor) 

  • Association

  • Clustering

 

Evaluating what''s been learned 

  • Training and testing 

  • Estimating classifier accuracy (holdout, cross-validation, leave-one-out) 

  • Combining multiple models 

 

Mining real data 

 

Dealing with Big Data

  • What is makes Data, Big Data

  • Scalable Data Analytics Framework

  • Large-scale Data Analysis Models

  • Distributed Storage Architecture

  • NoSQL Databases

  • Data Flow Management

Ethics and privacy

Programs

Programs where the course is taught: