Knowledge Discovery
Objectives
Objectives:
Understand and develop data processing and analysis processes.
Learn and understand the functioning of data analysis algorithms used in previously undiscovered data classification tasks, supervised and unsupervised learning, association rules (“if you like X you will probably be interested in Y, Z and W”).
Big Data data processing and analysis concepts. Tools that allow you to deal with Big Data in the Cloud (based on Google technologies). Process terabytes in seconds and petabytes in minutes.
Video: Machine Learning: Making Sense of a Messy World
Tools for Artificial Intelligence.
Ethics in data processing in the form of debates between groups of students
Also develop Non-Technical Capabilities:
Oral and written communication skills
Conducting a demonstration
Report on the analysis, design and implementation of a solution
Work organization, time management and meeting deadlines
Team work and collaboration
Research capacity and autonomy
General characterization
Code
12793
Credits
6.0
Responsible teacher
João Paulo Branquinho Pimentão, Pedro Alexandre da Costa Sousa
Hours
Weekly - 4
Total - 56
Teaching language
Português
Prerequisites
Non-existent
Bibliography
Data Mining de Eibe Frank, Christopher Pal, Mark Hall e Ian H. Witten, ISBN: 9780128042915, ELSEVIER SCIENCE & TECHNOLOGY
Handbook of Data Mining and Knowledge Discovery 1st Edition, by Jan Zyt (Author), Willi Klosgen (Editor), the late Jan M. Zytkow (Editor), ISBN-13: 978-0195118315
Big Data Analytics: Systems, Algorithms, Applications 1st ed. 2019 Edition, by C.S.R. Prabhu (Author), at all. , ISBN-13: 978-9811500930
Information theory, inference, and learning algorithms - Mackay, David, Cambridge University Press, ISBN: 978-0521642989
Principles of data mining - Hand, David; Smyth, Padhrai; Mannila, Heikki, MIT Press, ISBN: 978-0262082907
Pattern recognition and machine learning - Bishop, Christopher M., Springer, ISBN: 978-0387310732
Visualize This: The Flowing Data Guide to Design, Visualization, and Statistic - Yau, Nathan, John Wiley & Sons. ISBN: 978-0470944882
Teaching method
The course is divided into theoretical-practical and practical classes.
In theoretical-practical subjects, more theoretical themes are covered that introduce problems that students will have to solve and that are the basis for the work that they will implement in their respective practice.
In practical classes the execution of problems (implementation) is carried out.
The work that students develop in practice must be delivered within defined deadlines, together with an analysis, design and implementation report.
There is also a set of debates on ethical topics related to data analysis in which students participate in groups, with their performance evaluated by teachers and peers.
Evaluation method
Theoretical-Practical component (weight of 34%) - NTP:
=========================================================
It can be carried out through 1 test or exam;
It is necessary to have a grade (exam or test) of no less than 9.5.
Practical component (66% weight) - NP:
========================================
1st work (T1): 30%, Final work (T2) 50%, Qwiklabs: 10%, Debate: 10%
Work: Delivery via Moodle. Assessment based on implemented features.
It is necessary to have an average grade of no less than 9.5.
NOTE: approvals from the previous academic year can be used this semester.
Final Grade Calculation - NF:
====================
NF = 34%*NTP + 66%*NP
Subject matter
Introduction
-
Intelligent systems
-
Data «warehouse»
-
Knowledge discovery
Managing Knowledge discovery projects
-
CRISP-DM, SEMMA
Data Warehouse and OLAP
-
Data Warehouse and DBMS
-
Multidimensional data model
-
OLAP
Data preprocessing
-
Data cleaning
-
Data transformation
-
Data reduction
-
Concept hierarchies
-
Data Quality
Data mining knowledge representation
-
Interestingness measures
-
Input data
-
Models
-
Visualization techniques
Learning
-
Classification/regression
-
Segmentation
-
Instance-based methods (nearest neighbor)
-
Association
-
Clustering
Evaluating what''''''''s been learned
-
Training and testing
-
Estimating classifier accuracy (holdout, cross-validation, leave-one-out)
-
Combining multiple models
Mining real data
Dealing with Big Data
-
What is makes Data, Big Data
-
Scalable Data Analytics Framework
-
Large-scale Data Analysis Models
-
Distributed Storage Architecture
-
NoSQL Databases
-
Data Flow Management
Ethics and privacy
Programs
Programs where the course is taught: