Serial code profiling is done for two main reasons:
- To improve (sometimes dramatically) the efficiency of the serial code, or
- As the important first step when converting a serial code to a parallel (MPI, threads, CUDA, ...) one.
One way to profile a serial code is to use program "gprof" - a part of the GNU compiling suite (gcc/g++/gfortran etc.). Here is a procedure which will produce a nice call graph plot with the profiling information.
1. Switch to GNU compilers:
module unload intel module load gcc
2. Use GNU compilers (gcc/g++/gfortran) to compile your code - with all the optimization flags (e.g. -O2), plus the profiling flag ("-pg"), for both compiling and linking stages.
3. Modify you environment:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/syam/lib64:/home/syam/lib64/graphviz export PATH=$PATH:/home/syam/bin
4. Run your code the usual way, either directly on the login node (if it is short):
/path/to/your/executable arg1 arg2
or with the scheduler:
sqsub -o out -r1h /path/to/your/executable arg1 arg2
5. Once the run is finished, run this command (it is pretty fast):
gprof /path/to/your/executable | gprof2dot.py | dot -Tpng -o output.png
If you did everything correctly, you will produce the plot file output.png .
Here is an example of the output.png file:
The numbers inside the coloured rectangulars are
- the percentage of the total cpu cycles spent in this routine and all the routines below (first line);
- the percentage of the total cpu cycles spent in this routine only (second line);
- the number of times this routine was called;
Looking at this example plot, one can tell that the most time consuming routines are "surface_ray_intersection", "refraction", and "surface_derivative". Optimization and/or parallelizations efforts have to be aimed at these routines first.
If you have issues running the above commands (they are part of the package Graphviz), you can compile the package yourself. You can obtain the sources here.