Text Analytics

Objectives

None

General characterization

Code

200168

Credits

7.5

Responsible teacher

Ricardo Costa Dias Rei

Hours

Weekly - Available soon

Total - Available soon

Teaching language

Portuguese. If there are Erasmus students, classes will be taught in English

Prerequisites

Week

Instructor

Content

1
February 14th

FLP & RR

  • -  Course Overview

  • -  Whats Natural Language Processing/Text Analytics and why does it matter.

  • -  Python Refresher

  • -  Setting up Python (Anaconda and Jupyter Notebooks)

2
February 21th

RR

  • -  Methodology, corpora and evaluation

  • -  Introduction to the NLTK

  • -  Corpus cleaning and creation of train/val/test sets.

3
February 28th

RR

- Bag-of-word models.
- Tokenization.
- Distance metrics.
- Comparison of documents.

4
March 14th

RR

  • -  N-grams

  • -  TF-IDF features

  • -  Feature engineering

  • -  Stemming

  • -  POS filtering

5
March 21th

RR

  • -  Distance based classification (KNN)

  • -  N-gram Counting

  • -  Simple document classifier.

6
March 28th

RR

- Information Retrieval
- QuestionAnswering
- Building a simple customer support chatbot

7
April 4nd

RR

- First Test & Presentation about the project.

8
April 11th

RR

  • -  Sequence models

  • -  Markov models and hidden Markov models

  • -  Dynamic programming algorithms

9
April 25rd

 
  • -  Machine Learning: Naive Bayes and Perceptron

  • -  Document classification revisited I

10 May 2th

RR

  • -  Multi-layer Perceptron (Overview)

  • -  Representation Learning (introduction to word embeddings)

  • -  Document classification revisited II

11 May 9th

RR

- Word Embeddings - SentimentAnalysis

12
May 16th

RR

  • -  Invited talk.

  • -  NLP industry applications.

  • -  Project deadline.

13
May 23st

RR

- Sequence modelling (Deep Learning overview)

14
May 30th

RR

- Second Test
- Sequence modelling (Deep Learning overview)

Bibliography

[A] Sarkar, Dipanjan. "Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data"Apress; 1st ed. edition (December 1, 2016)
[B] Jurafsky, Daniel and H. Martin, James "Speech and Language Processing" Prentice Hall; 2nd edition (May 16, 2008)

Teaching method

To successfully finish this course students need to score a minimum of combined 9.5 points from the following components:

  1. Theoretical Tests (25%): consists of two mini-tests that will need to be solved during class- es. Students will have one hour to answer a few theoretical questions;

  2. Continuous Evaluation (15%): consists of simple quizzes that will be done during classes;

  3. Final Project (60%): The final project consists of the elaboration of a report that details the process of transformation, manipulation, analysis and application of the learned techniques for a specific NLP task. The project is to be developed in groups of up to two/three elements. More details about the project will be shared during the first couple of weeks in the Moodle page

Evaluation method

English

Subject matter

Theoretical and Practical classes