When writing a parallel code, e.g. using MPI, we often have such a workflow: one process loads data from disk, then distributes the data to other processes to perform computational tasks and then collects the results. This model poses a potential challenge: the input data can be too large to fit into memory of a single process. Moreover, such jobs are likely to wait longer because they require more resources (RAM) to run, and as a result “cost” more on systems where both the number of cores and the amount of memory required by a job are factored in the cost (e.g. on Graham and Cedar.) In this seminar we will introduce the techniques that can be used to address these issues.
The audience is expected to know C/C++ programming language and have basic knowledge of MPI.