Information Retrieval
- Learn the concept of information relevance.
- Analyze text data.
- Learn how to rank information by relevance.
- Understand evaluation protocols.
- Implement information retrieval models.
- Ability to adapt and improve components of a search engine.
- Deploy search engines with large-scale datasets.
- Design evaluation protocols and evaluate search engines.
- Select the right IR techniques for particular problems.
- Design information retrieval systems.
- Ability to do critical thinking about retrieval results.
General characterization
Responsible teacher
João Miguel da Costa Magalhães
Weekly - 4
Total - 2
Teaching language
Programming skills. Python preferably.
Linear algebra and probability courses.
Main reference: Dan Jurafsky and James H. Martin, Speech and Language Processing (3rd ed. draft)
Complementary reference: C. D. Manning, P. Raghavan and H. Schütze, “Introduction to Information Retrieval”, Cambridge University Press, 2008.
Teaching method
Nas aulas teóricas é apresentada a matéria, com exemplos e discussão cuidada dos conceitos mais importantes. As aulas laboratoriais destinam-se à realização de 1 projeto com 3 entregas ao longo do semestre.
Será disponibilizada uma página da disciplina onde se mantém informação atualizada sobre o funcionamento da mesma. Os slides da matéria teórica e o guia do projeto estarão disponíveis na página Web da disciplina.
A avaliação da disciplina é composta por 1 teste escrito individual realizado no fim do semestre e trabalhos de laboratório.
Evaluation method
Grading is divided into the theoretical part and a laboratory project:
Theoretical test/exam: 40% of the final grade (minimum grade is 9.0). Students may use a calculator and one A4 page with their own notes. Notes must be handwritten by the student and the page should be handed in at the end of the exame/test.
Laboratory work (minimum grade is 9.0). The laboratory work consists of an introductory project (20%) and a consolidation project to be submitted in two phases (20% for each phase).
Each lab submissions needs to be include a report and the code.
Subject matter
1. Introduction
2. Text processing, NGRAMS, cosine distance
3. Language models
4. Evaluation
5. Pseudo relevance models
6. Classification tasks: sentiment, category, spam
7. Learning to rank
8. Word embeddings
9. Contextual embeddings
10. Information extraction
11. Question answering
12. Ethics in Computational NLP