Latest revision as of 10:14, 6 May 2019
With big datasets, disk I/O on large parallel systems can pose a significant bottleneck compared to the rest of your code's workflow. Parallel filesystems have been optimized to support large efficient I/O simultaneously by multiple users on multiple nodes, however, contrary to popular thinking, they do not provide "supercomputing" disk performance. In this introductory webinar I will talk about the basics of parallel filesystems, techniques to optimize your storage, as well as various methods to organize parallel disk I/O. Due to the lack of time, I will not go into the details of all possible methods, but will give several examples of parallel I/O using MPI-IO (part of MPI2), and then will briefly talk about the strengths and limitations of most popular higher-level parallel I/O libraries (HDF5, NetCDF, and ADIOS).