Computational Methods and Tools for Exploring Texts
Objectives
- To get acquainted with, to understand and to evaluate the methods and tools for analysing and extracting information from large sets of linguistic data;
- To know how to organize and use large sets of linguistic data to extract useful information directed and relevant to research issues specific of the Arts and Humanities fields;
- To know methods of analysis and detection of linguistic cues and features and to determine their relevance to perform specific information extraction and text mining tasks for non-linguistic purposes;
- To develop skills to build and use textual corpora in and analytical and informed way, accordinh to tested methodologies and using available tools for corpus treatment and analysis;
- To develop skills and strategies for detecting and using linguistic cues and features for research purposes in the field of Arts and Humanities.
General characterization
Code
02111034
Credits
10.0
Responsible teacher
Raquel Fonseca Amaro
Hours
Weekly - 3
Total - 280
Teaching language
Portuguese
Prerequisites
N/A
Bibliography
- Beloso, B. S. (2015). Designing, Describing and Compiling a Corpus for English Architecture. In Procedia - Social and Behavioral Sciences 198. Elsevier. 459-464;
- Ebensgaard Jensen, K. (2014). Linguistics and the digital humanities: (Computational) corpus linguistics. MedieKultur: Journal of Media and Communication Research, 30, pp. 117-136;
- McEnery, T. and A. Hardie (2012). Corpus Linguistics: Method, theory and practice. Cambridge University Press;
- Odebrecht, C., Belz, M., Zeldes, A., Lüdeling, A. & Krause, T. (2017). RIDGES Herbology: Designing a Diachronic MultiLayer Corpus. In: Language Resources and Evaluation 51.3, pp. 695–725;
- O Keeffe, Anne and Mc Carthy, Michael (eds) (2010). The Routledge Handbook of Corpus Linguistics (Routledge Handbooks in Applied Linguistics). London & New York: Routledge;
- Sinclair, J. (2004). Trust the text: language corpus and discourse. London and New York: Routledge.
Teaching method
Theoretical-practical classes and tutorial guidance, using case studies and practical application of acquired knowledge, including: (i) presentation of contents by the teacher; (ii) discussion and critical analysis of the bibliography on topics of the syllabus; (iii) practical application of knowledge acquired in individual and group work in specific tasks, using computational tools.
Evaluation method
Continuous Assessment - Active participation in seminar activities(30%), Project work(70%)
Subject matter
1. Corpus Linguistics
1.1. Introduction and theoretical framework;
1.2. Corpus constitution: criteria, parameters and representativeness;
1.3. Corpus tools and procedures: overview.
2. From linguistic data to specific information extraction
2.1. Linguistic units, features and cues;
2.2. Textual analysis: macro vs. micro level; syntagmatic vs. paradigmatic analysis;
2.3. Lexical statistics, concordances and collocations.
3. Applying Corpus Linguistics and text mining strategies
3.1. Research question, data selection and corpus compilation;
3.2. Determining relevant linguistic features and cues;
3.3. Results extraction and analysis.
Programs
Programs where the course is taught: