High Perfomance Computing
Objectives
Understand:
- Specificities of parallel vs. sequential execution
- Different types of parallel architectures
- Shared memory and message passing programming models
- Methodologies for developing solutions that utilize parallel computing
- Metrics for evaluating a parallel program
- Programming and execution models of GPUs
- Programming and execution models of data parallel frameworks
Be able to:
- Identify opportunities for parallelization
- Partition a problem for parallel execution
- Implement parallel algorithms
- Reason about the behavior of parallel systems
- Measure, analyze, and optimize the performance of parallel computing
Know:
- Programming languages and libraries for parallel computing in multicore environments, GPUs, and distributed memory
- Algorithmic strategies for several types of problems
- Optimization techniques across different architectures
General characterization
Code
11165
Credits
6.0
Responsible teacher
Hervé Miguel Cordeiro Paulino
Hours
Weekly - 4
Total - 52
Teaching language
Português
Prerequisites
Available soon
Bibliography
Asis widespread practice in parallel and high-performance computing courses, there is no single textbook. There are several books that cover fundamental concepts of the course. Here is a list:
- Robert Robey, Yuliana Zamora. ParallelandHigh PerformanceComputing. Simon andSchuster (2021). ISBN, 1617296465
- Peter Pacheco, MatthewMalensek. AnIntroduction to ParallelProgramming. Morgan Kaufmann (2021). ISBN: 9780128046050
- Mahmoud Parsian. Data AlgorithmswithSpark. O''Reilly (2022). ISBN-10. 1492082384 · ISBN-13. 978-1492082385
- McCool M., Arch M., Reinders J.; StructuredParallelProgramming: Patterns for EfficientComputation. Morgan Kaufmann (2012); ISBN: 978-0-12-415993-8
Programming guides:
- NVIDIA Corporation. CUDA C++ Programming Guide (2024)
- Apache Spark.Apache Spark Programming Guides (2024)
Teaching method
The lectures introduce the core concepts of the course and foster discussion on the most relevant topics, these being consistently motivated and illustrated with practical examples. The lecture materials (such as slides) are made available in advance, encouraging students to study ahead of time.
The lab sessions are conducted in laboratories featuring computers equipped with multicoreprocessors, which enable the execution of the shared memory programming exercises. For GPU programming and distributed memory architecture exercises, students have access to a cluster of multiple machines, each equipped with more than one multicoreprocessor and many with NVIDIA GPUs (required for GPU programming with CUDA). Several IDEs support the configuration of remote toolchains, enabling students to work on the lab’s (or their own) computers while compiling and executing code remotely on the cluster.
All laboratory exercises are provided through a GIT repository, including all necessary software dependencies (except the required programming languages). The exercises also feature automated tests to help students verify the correctness of their implementations.
Evaluation method
Theoretical-Practical Component (CTP)
The theoretical-practical component (CTP) is obtained by completing two individual tests during the semester or by taking an individual exam. Both the tests and the exam will be conducted in-person and without consultation. The tests and the exam may include questions about the project assignments completed during the current academic year.
Minimum requirement: a CTP grade of ≥ 8.50 is required.
Laboratory Component (CL)
The laboratory component (CL) is calculated using the following formula: CL = 0.6 * CG + 0.4 * CI where CG = Group Grade and CI = Individual Grade
The group grade (CG) is based on the quality of the development of two programming project assignments and the preparation of two reports that describe, evaluate, and analyze the developed solution. The project is carried out in groups of two students, and the CG is the same for both members of the group.
The individual grade (CI) is determined by the individual contribution to the programming project assignments development. This component is assessed by considering the distribution of work between group members, as reported by the students in their reports, and by evaluating the quantity, difficulty, and relevance of each student''s individual work through a discussion with them.
The criterion for frequency is: CL >= 8.5
Frequency from the previous two years is valid for the current year. However, this does not prevent a student from enrolling in a practical session and attending practical classes. The student may also attempt to improve the previously obtained grade. In this case, the final CL grade will be the better of the CL grade obtained this year or the one obtained previously.
Final Grade (CF): CF = 0.6 * CTP + 0.4 * CL
Subject matter
Why Parallel Computing
Fundamentals
- Parallel Architectures
- Parallel Programming Models
- Designing Parallel Programs
- Finding parallelization opportunities
- Parallel decomposition
- Dependency Analysis
- Task Assignment
- Communication/Synchronization
- Load balancing
- Parallel Performance
- Analytical performance measures
- Amdahl’s and Gustafson-Barsis’ laws
Shared memory parallel programming
- Multicores
- Task- and Loop-based parallel programming
- Dealing with shared state
- GPUs
- GPU architecture
- GPU programming
- Performance Optimization
- Work distribution and scheduling
- Locality
- Overlap computation with computation and/or communication
Data Parallel Computing
- Data Parallel Thinking
- Data Parallel multicore and GPU programming
- Data Parallel distributed computing frameworks
- MapReduce, Spark
- Distributed File System
- Execution model
- Performance Optimization
- Work distribution and scheduling
- Locality and communication