Knowledge Discovery
Objectives
At the end of this course unit the student is expected to have acquired knowledge, skills and competences that will allow him / her to:
Knowledge:
Identification of Knowledge Discovery problems
Datamining
ETL
Datawarehouse
Do:
Algorithm specification.
Specification, Development and implementation of datamining
Non-Technical:
Written and oral communication skills
Demonstration skills
Produce reports of analysis, design and implementation of a solution
Work management, time management and delivery deadlines
Teamwork and tem participation
Delivery
Honesty
General characterization
Code
12793
Credits
6.0
Responsible teacher
João Paulo Branquinho Pimentão, Pedro Alexandre da Costa Sousa
Hours
Weekly - 4
Total - 56
Teaching language
Português
Prerequisites
Not Defined.
Bibliography
Data Mining de Eibe Frank, Christopher Pal, Mark Hall e Ian H. Witten, ISBN: 9780128042915, ELSEVIER SCIENCE & TECHNOLOGY
Handbook of Data Mining and Knowledge Discovery 1st Edition, by Jan Zyt (Author), Willi Klosgen (Editor), the late Jan M. Zytkow (Editor), ISBN-13: 978-0195118315
Big Data Analytics: Systems, Algorithms, Applications 1st ed. 2019 Edition, by C.S.R. Prabhu (Author), at all. , ISBN-13: 978-9811500930
Information theory, inference, and learning algorithms - Mackay, David, Cambridge University Press, ISBN: 978-0521642989
Principles of data mining - Hand, David; Smyth, Padhrai; Mannila, Heikki, MIT Press, ISBN: 978-0262082907
Pattern recognition and machine learning - Bishop, Christopher M., Springer, ISBN: 978-0387310732
Visualize This: The Flowing Data Guide to Design, Visualization, and Statistic - Yau, Nathan, John Wiley & Sons. ISBN: 978-0470944882
Teaching method
The course is divided into theoretical-practical classes and practical classes.
In the theoretical-practical classes the subjects are introduced and practical problems are formulated that the students have to solve in the respective pratical classes.
In the practical classes the execution of the problems (implementation) is carried out.
All the work that the students develop in practice is part of a larger work (integration) that students have to deliver in a defined time frame, together with a report of analysis, design and implementation.
Evaluation method
Theoretical-Practical component (weight of 34%) - NTP:
===========================================
Can be performed through 1 test or exam;
It is necessary to have a grade (of exam or average of the tests) of not less than 9.5 values.
Practical component (weight of 66%) - NP:
=========================================
2 projects. Delivery through Moodle. Evaluation based on implemented functionalities.
It is necessary to have average grade of not less than 9.5 values.
NOTE: Approvals from the previous year can be used this semester.
Calculation of final grade - NF:
====================
NF = 34% * NT + 66% * NTP
Subject matter
Introduction
-
Intelligent systems
-
Data «warehouse»
-
Knowledge discovery
Managing Knowledge discovery projects
-
CRISP-DM, SEMMA
Data Warehouse and OLAP
-
Data Warehouse and DBMS
-
Multidimensional data model
-
OLAP
Data preprocessing
-
Data cleaning
-
Data transformation
-
Data reduction
-
Concept hierarchies
-
Data Quality
Data mining knowledge representation
-
Interestingness measures
-
Input data
-
Models
-
Visualization techniques
Learning
-
Classification/regression
-
Segmentation
-
Instance-based methods (nearest neighbor)
-
Association
-
Clustering
Evaluating what''s been learned
-
Training and testing
-
Estimating classifier accuracy (holdout, cross-validation, leave-one-out)
-
Combining multiple models
Mining real data
Dealing with Big Data
-
What is makes Data, Big Data
-
Scalable Data Analytics Framework
-
Large-scale Data Analysis Models
-
Distributed Storage Architecture
-
NoSQL Databases
-
Data Flow Management
Ethics and privacy