From Documentation
Jump to: navigation, search

Profiling (improving the code efficiency) is critical in High Performance Computing. Sometimes things as simple as placing the enclosed loops in a wrong order can result in a code which runs 10 times slower. This is bad for the researcher who runs the inefficient code (only more limited or smaller size problems can be addressed), but is also very bad for the rest of researchers who share the cluster, as valuable cpu cycles are wasted. In this webinar I will review the code profiling options available on our national system Graham. The focus will be on profiling parallel codes written in MPI, OpenMP, and CUDA. In particular, I will be showcasing the ARM's (used to be Allinea's) profiler MAP, which is great for serial and parallel codes profiling.