Text Mining
Objectives
The main goal of the Text Mining curricular unit is to provide the students with a fundamental understanding of the challenges faced when automatically processing text written in a natural language, as well as to provide them with tools to face those challenges. Topics covered include the methodology followed, as well as the various steps that comprise it, such as preprocessing, text representations, and learning models.
General characterization
Code
200181
Credits
4.0
Responsible teacher
João Bruno Morais de Sousa Jardim
Hours
Weekly - Available soon
Total - Available soon
Teaching language
Portuguese. If there are Erasmus students, classes will be taught in English
Prerequisites
n/a
Bibliography
Jurafsky, Daniel and H. Martin, James "Speech and Language Processing", Prentice Hall.
Teaching method
Classes will involve a mix of lectures and practical exercises. Moreover, the course will have a strong proactive learning component, as such students are expected to actively participate in the class, and it is recommended to read the reading materials prior to each class. All programming activities will be done in Python, in the Jupyter Notebook Environment.
Evaluation method
1st and 2nd call
Continuous version:
- Exam: 50% (min grade: 9.5)
- Continuous evaluation: 10%
- Project: 40% (min grade: 9.5)
Exam version:
- Exam: 60% (min grade: 9.5)
- Project: 40% (min grade: 9.5)
Subject matter
- LU1: Introduction to Text Mining: overview, challenges, and methodology
- LU2: Text pre-processing
- LU3: Text representations: bag-of-words, n-grams, features, and word embeddings
- LU4: Distance-based methods: lexical distance and TD-IDF
- LU5: Learning methods: language modeling, structured prediction models, and classification models
- LU6: Classification project to be developed on Python
Programs
Programs where the course is taught: