High Perfomance Computing
Knowledge and understanding goals:
- Parallel architectures
- Parallel programming paradigms
- General parallel algorithm design methodologies
- Problem-type specific parallel algorithmic strategies
- Techniques to optimize the performance of parallel algorithms in the studied parallel architectures
Know how to:
- Optimize sequential programs
- Implement parallel algorithms to handle compute-intensive or data-intensive applications on GPU(s)(with CUDA) and/or on distributed memory architectures (with Spark).
- Measure and analyze the performance of a parallel computation.
- Reason about and critically evaluate the algorithmic and technological alternatives available for solving a problem.
- Design a solution for a problem a before going into the implementation phase.
- Team work.
- Structure and write project reports.
Hervé Miguel Cordeiro Paulino, Pedro Abílio Duarte de Medeiros
Weekly - 4
Total - 52
Students should have knowledge about computer architecture, computer networks, and operating systems, and good programming skills. The exercises will use the C/C++ and Java programming languages.
As usual in HPC courses, there is no required textbook. There are several books that cover the fundamental concepts of HPC. A list follows:
- T Sterling, M Brodowicz, M Anderson. High Performance Computing - Modern Systems and Practices. Morgan Kaufmann, 2017.
- T Rauber and G Rünger. Parallel Programming for Multicore and Cluster Systems. Springer, 2013
- P Pacheco.An Introduction to Parallel Programming.Morgan Kaufmann,2011
- Ian Foster. Designing and Building Parallel Programs. Addison-Wesley, 1995,
- NVIDIA documentation
- J Sanders, Edward Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming, Addison-Wesley, 2010
- A Shook and D Miner. MapReduce Design Patterns. O’Reilly Media, 2012
- High Performance Spark. O’Reilly Media, 2017
The lectures aim to present the course''''s concepts topics and discuss the most relevant questions.
The lab sessions take place in a general purpose lab with access to PCs (which are multi-core) that enable the execurtion of the shared memory programming exercises. Concerning the programing of GPUs and of distributed memory architectures, a cluster of multiple machines equipped with multi-core processors and NVIDIA GPUs is accessible.
All laboratorial exercises are available from a GIT repository and include all software dependencies (other than the needed programing languages). They also include automated tests to help the student assess the correctness of its implementations.
Two intermediate tests or final exam (50% of the final grade). The tests ahave the same relative weight.
Two programming assignments (40% of the final grade). The assignments have the same relative weight.
Individual discussion of the programming assignments (10% of the final grade).
Average of the grades of the programming assignments >= 8.
NTP = average of the grade of the tests or grade of the final exam
NP = average of the grades of the programming assignments
ND = grade of the discussion
if NTP < 8 then final grade = NTP
else final grade = NTP*0.5 + NP*0.4 + ND*0.1
- Why Parallel Computing?
- Why High Performance Computing?
- The convergence of the Big Compute and Big Data trends of thought
Fundamentals of Parallel Computing
- Parallel Architectures
- Parallel Performance
- Parallel Programming Paradigms
- Designing Parallel Algorithms
- Parallel Programming Patterns and Strategies
- Shared Memory Processing
- GPU Computing
- Message-passing programming
Data-Centric High Performance Computing
- Apache Spark
Parallel Algorithms (Putting it All Together)
- Graph processing algorithms
- Machine learning algorithms
The Future of High Performance Computing
- Challenges in the industry
- Open research topics
Programs where the course is taught: