Text Mining

Objectives

The main goal of the Text Mining curricular unit is to provide the students with a fundamental understanding of the challenges faced when automatically processing text written in a natural language, as well as to provide them with tools to face those challenges. Topics covered include the methodology followed, as well as the various steps that comprise it, such as preprocessing, text representations, and learning models. 

General characterization

Code

200181

Credits

4.0

Responsible teacher

João Bruno Morais de Sousa Jardim

Hours

Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Prerequisites

n/a

Bibliography

Jurafsky, Daniel and H. Martin, James "Speech and Language Processing", Prentice Hall.

Teaching method

Classes will involve a mix of lectures and practical exercises. Moreover, the course will have a strong proactive learning component, as such students are expected to actively participate in the class, and it is recommended to read the reading materials prior to each class. All programming activities will be done in Python, in the Jupyter Notebook Environment.

Evaluation method

1st and 2nd call

Continuous version:

  • Exam: 50% (min grade: 9.5)
  • Continuous evaluation: 10%
  • Project: 40% (min grade: 9.5)

Exam version:

  • Exam: 60% (min grade: 9.5)
  • Project: 40% (min grade: 9.5)

Subject matter

  • LU1: Introduction to Text Mining: overview, challenges, and methodology
  • LU2: Text pre-processing
  • LU3: Text representations: bag-of-words, n-grams, features, and word embeddings
  • LU4: Distance-based methods: lexical distance and TD-IDF
  • LU5: Learning methods: language modeling, structured prediction models, and classification models
  • LU6: Classification project to be developed on Python