Computational Methods and Tools for Exploring Texts

Objectives


  • To get acquainted with, to understand and to evaluate the methods and tools for analysing and extracting information from large sets of linguistic data;

  • To know how to organize and use large sets of linguistic data to extract useful information directed and relevant to research issues specific of the Arts and Humanities fields;

  • To know methods of analysis and detection of linguistic cues and features and to determine their relevance to perform specific information extraction and text mining tasks for non-linguistic purposes;

  • To develop skills to build and use textual corpora in and analytical and informed way, accordinh to tested methodologies and using available tools for corpus treatment and analysis;

  • To develop skills and strategies for detecting and using linguistic cues and features for research purposes in the field of Arts and Humanities.

General characterization

Code

02111034

Credits

10.0

Responsible teacher

Raquel Fonseca Amaro

Hours

Weekly - 3

Total - 280

Teaching language

Portuguese

Prerequisites

N/A

Bibliography

  • Beloso, B. S. (2015). Designing, Describing and Compiling a Corpus for English Architecture. In Procedia - Social and Behavioral Sciences 198. Elsevier. 459-464;
  • Ebensgaard Jensen, K. (2014). Linguistics and the digital humanities: (Computational) corpus linguistics. MedieKultur: Journal of Media and Communication Research, 30, pp. 117-136;
  • McEnery, T. and A. Hardie (2012). Corpus Linguistics: Method, theory and practice. Cambridge University Press;
  • Odebrecht, C., Belz, M., Zeldes, A., Lüdeling, A. & Krause, T. (2017). RIDGES Herbology: Designing a Diachronic MultiLayer Corpus. In: Language Resources and Evaluation 51.3, pp. 695–725;
  • O Keeffe, Anne and Mc Carthy, Michael (eds) (2010). The Routledge Handbook of Corpus Linguistics (Routledge Handbooks in Applied Linguistics). London & New York: Routledge;
  • Sinclair, J. (2004). Trust the text: language corpus and discourse. London and New York: Routledge.

Teaching method

Theoretical-practical classes and tutorial guidance, using case studies and practical application of acquired knowledge, including: (i) presentation of contents by the teacher; (ii) discussion and critical analysis of the bibliography on topics of the syllabus; (iii) practical application of knowledge acquired in individual and group work in specific tasks, using computational tools.

Evaluation method

Continuous Assessment - Active participation in seminar activities(30%), Project work(70%)

Subject matter

1. Corpus Linguistics
1.1. Introduction and theoretical framework;
1.2. Corpus constitution: criteria, parameters and representativeness;
1.3. Corpus tools and procedures: overview.

2. From linguistic data to specific information extraction
2.1. Linguistic units, features and cues;
2.2. Textual analysis: macro vs. micro level; syntagmatic vs. paradigmatic analysis;
2.3. Lexical statistics, concordances and collocations.

3. Applying Corpus Linguistics and text mining strategies
3.1. Research question, data selection and corpus compilation;
3.2. Determining relevant linguistic features and cues;
3.3. Results extraction and analysis.