Algorithms and Data Structures for Bioinformatics


- Bioinformatics as the study of information flow in biological systems.
- The most used algorithms, their use and limitations, for the analysis of DNA and protein sequences.
- Modern techniques for pattern recognition and their application to the analysis of biological sequences
Be able to:
- Select appropriate solutions for the bioinformatic analysis of sequences.
- Reason about bioinformatics problems, decomposing them in their basic elements
- Chain existing programs to solve problems requiring several processing steps
- Critically evaluate the results obtained.
- Use modern libraries for sequence analysis and pattern recognition.
- Most used algorithms for sequence analysis bioinformatics problems
- Modern methods for prediction and pattern recognition applied to biological sequences

General characterization





Responsible teacher

João Alexandre Carvalho Pinheiro Leite, Jorge Carlos Ferreira Rodrigues da Cruz


Weekly - 4

Total - 48

Teaching language





Mandatory reading:
Haubold, Bernhard, and Thomas Wiehe. Introduction to computational biology: an evolutionary
approach. Springer Science & Business Media, 2006.
Complementary reading:
Marketa J. Zvelebil, Marketa Zvelebil, Jeremy O. Baum, Understanding Bioinformatics, Garland
Science, 2008
Miguel Rocha, Pedro G. Ferreira, Bioinformatics Algorithms: Design and Implementation in Python,
Academic Press, 2018
Bishop, Pattern Recognition and Machine Learning, Springer, 2006

Teaching method

Lectures and tutorials, introductory exercises and applications of the tools and services covered in class.
Evaluation consists on tests and essays, with a theoretical essay covering a specific technique or tool and a practical assignment in which the student must improve and apply their skills to a practical problem. Each student can choose the theme and subject of their assignments, provided they are relevant and approved by the teacher.

Evaluation method

The assessment consists of two components:

1 Project component, consisting of two practical assignments.

1 theoretical component, consisting of two tests.

The grade of the project component is the simple average of the grades of the two practical assignments, rounded up to one decimal.

The grade of the theoretical component is the simple average of the grades of the two tests, rounded up to one decimal.

The final grade is the simple average of the two components of the assessment, rounded to the nearest integer.

The grade of the recourse exam replaces the grade of the theoretical-practical component.

Subject matter

 Dynamic programming for local and global alignment
 Substitution matrices
 Sequence search algorithms
 Hierarchical and Iterative multiple sequence alignments
 Phylogenetics: distance, parsimony and likelihood methods; bootstrapping
 Supervised learning algorithms for pattern recognition and prediction in biological sequences.
 Unsupervised learning algorithms for clustering and modeling of biological sequences.
 Genome assembly and analysis algorithms
 Introduction to structural biology