From Documentation
Revision as of 12:56, 20 August 2013 by Hahn (Talk | contribs) (SMP mechines)

Jump to: navigation, search

This tutorial introduces basic OpenMP concepts and issues to run OpenMP codes on SHARCNET systems. To learn more about parallel programming with OpenMP, please see the references provided at the end of this page.

Introduction

OpenMP is a portable, scalable programming model for parallel approaches on shared memory platforms. OpenMP Application Program Interface (API) was defined by a group of major computer hardware and software vendors. It supports programing in C/C++ and Fortran, and provides simple parallel solutions to an existing serial program. The newest OpenMP specifications can be found at openmp.org .

Concepts

The goal of OpenMP is to provide a simple but standard solution to parallelize serial programs for shared memory machines. It is based on the existence of multiple threads in the shared memory programming paradigm and uses the fork-join model of parallel execution.

Most OpenMP parallelism is specified through the use of compiler directives. All compiler vendors are supposed to follow the standard API specification in their implementations. While OpenMP is dominated by compiler directives, it is an explicit (not automatic) programming model which offers the programmer full control over parallelization.

OpenMP Directives

The components of OpenMP include the compiler directives, runtime library routines and environmental variables. Compiler directives are the major part of OpenMP and they can be viewed in the following categories:

  • Parallel regions
  • Worksharing
  • Data Environment
  • Synchronization

Parallel region directive

A parallel region is a block of code that will be executed by multiple threads. It is invoked by the 'Parallel' directive. When a thread reaches a PARALLEL directive, it creates a team of threads and becomes the master of the team. The master is a member of that team and has thread number 0 within that team. Starting from the beginning of this parallel region, the code is duplicated and all threads will execute that code.

There is an implied barrier at the end of a parallel section. Only the master thread continues execution past this point. If any thread terminates within a parallel region, all threads in the team will terminate, and the work done up until that point is undefined.

The parallel region concept can be illustrated via the 'helloworld' code in the file hello_world_openmpi.c listed next:

#include <omp.h>
#include <stdio.h>

//  file_name = hello_world_openmpi.c

int main (int argc, char *argv[]) {
  int id, nthreads;
  #pragma omp parallel private(id)
  {
    id = omp_get_thread_num();
    printf("Hello World from thread %d\n", id);
    #pragma omp barrier
    if ( id == 0 ) {
      nthreads = omp_get_num_threads();
      printf("There are %d threads\n",nthreads);
    }
  }
}

or the equivalent 'helloworld' in f90 listed in the file hello_world_openmpi.f90:

program hello90

! hello_world_openmpi.f90
  
use omp_lib
implicit none
integer :: id, nthreads
  !$omp parallel private(id)
  id = omp_get_thread_num()
  write (*,*) 'Hello World from thread', id
  !$omp barrier
  if ( id .eq. 0 ) then
    nthreads = omp_get_num_threads()
    write (*,*) 'There are', nthreads, 'threads'
  end if
  !$omp end parallel
end program

Worksharing directives

A work-sharing construct divides the execution of the enclosed code region among the members of the team that encounter it. Work-sharing constructs do not launch new threads. There is no implied barrier upon entry to a work-sharing construct, however there is an implied barrier at the end of a work sharing construct.

The typical applications of the worksharing directives are the parallization of for-loop in c/c++ and do-loop in fortran, i.e., sharing iterations of a loop across the threads.

The concept of worksharing in OpenMP can be illustrated via the simple do-loop example in fortran:

Let's start with the following serial do-loop program in f90:

      program loop
!     file name = serial_do_loop.f90
      implicit none
      integer, parameter :: N = 100000000
      integer :: i, ISTAT
      double precision, allocatable :: x(:)

      print *,'N = ',N
      ALLOCATE (x(N),STAT=ISTAT)
      IF (ISTAT/=0) STOP "ERR: ALLOCATE FAILS FOR x"

      do i = 1, N
        x(i) = 1./DBLE(i)
      end do
 
      print *,N,x(N),1.0D00/x(N)
  
      deallocate(x)

      end program

We modify above program by simply adding two directives to get a parallel-do loop program in OpenMP:

      program loop
!     file_name = parallel_do_loop.f90
      implicit none 
      integer, parameter :: N = 60000000
      integer :: i, ISTAT
      double precision, allocatable :: x(:)

      print *,'N = ',N
      ALLOCATE (x(N),STAT=ISTAT)
      IF (ISTAT/=0) STOP "ERR: ALLOCATE FAILS FOR x"

!$omp parallel do
      do i = 1, N
        x(i) = 1./DBLE(i)
      end do
!$omp end parallel do

      print *,N,x(N),1.0D00/x(N)
      deallocate(x)
 
      end program

The modified file can be compiled either as a serial job or OpenMP job. In the latter case all we need is to add the flag "-openmp" (for INTEL) or add the flag "-fopenmp" for GNU fortran gfortran.

Another example is the numerical integration of pi, i.e., compute pi by approximating the area under the curve f(x) = 4/(1+x*x) between x=0.0 and x=1.0

The initial serial code in c, as seen in file serial_pi.c, would be like follows:

#include <stdio.h>
#include <stdlib.h>
#define NBIN 100000000

# file = serial_pi.c
   
int main(int argc, char *argv[]) {
  int i;
  double step,x,sum=0.0,pi;
  step = 1.0/NBIN;
  for (i=0; i<NBIN; i++) {
    x = (i+0.5)*step;
    sum += 4.0/(1.0+x*x);
  }
  pi = sum*step;
  printf("PI = %f\n",pi);
}

and the openmp code in c, file parallel_pi.c, with data and thread control:

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>               
#define NUM_STEPS 100000000
 
# file =  parallel_pi.c

int main(int argc, char *argv[]) {
   int i;
   double x, pi;
   double sum = 0.0;
   double step = 1.0/(double) NUM_STEPS;
   int nthreads;
   /* do computation -- using all available threads */
   #pragma omp parallel
   {
       #pragma omp master
       {
           nthreads = omp_get_num_threads();
       }
       #pragma omp for private(x) reduction(+:sum) schedule(runtime)
       for (i=0; i < NUM_STEPS; ++i) {
           x = (i+0.5)*step;
           sum = sum + 4.0/(1.0+x*x);
       }
       #pragma omp master
       {
           pi = step * sum;
       }
   }
   /* print results */
   printf("parallel program results with %d threads:\n", nthreads);
   printf("pi = %g  (%17.15f)\n",pi, pi);
   return EXIT_SUCCESS;
}

The main issue in OpenMP is the data dependency. For details, please see the references listed below.

Submiting an OpenMP job with the sqsub command

Several C and FORTRAN OpenMP programs (source files) have been presented in the above two segments. We now proceed to illustrate how these source files can be compiled and executed using the sqsub command.

To compile the C OpenMP program hello_world_openmpi.c, we use the environmental variable $CC. You can find out what compilers (c and fortran) are loaded in your environment by issuing the commands:

printenv CC
printenv FC

Then, make sure to specify the proper flag (i.e. -openmp or -fopenmp):

Hello World programs (Parallel region)

(A) for icc (intel)

    $CC -o hello.exe -openmp hello_world_openmpi.c

(B) for gcc (GNU)

    $CC -o hello.exe -fopenmp hello_world_openmpi.c

To submit the OpenMP job use:

    sqsub -t -r 1h --mpp=1.0G -f threaded -n 4 -o hello.log ./hello.exe


For the fortran OpenMP program, compile with command:

(A) for ifort (intel)

    $FC -o my_OpenMP_exec -openmp   hello_world_openmpi.f90

(B) for gfortran (GNU)

    $FC -o my_OpenMP_exec -fopenmp   hello_world_openmpi.f90


and submit the jobs with command:

    sqsub -t -r 1h --mpp=2.0G  -o Fortran_hello.log  -f threaded -N 1 -n 8 ./my_OpenMP_exec


Do/for loop programs (Worksharing)

(A) for icc (intel)

    $CC -o parallel_pi.exe -openmp parallel_pi.c

(B) for gcc (GNU)

    $CC -o parallel_pi.exe -fopenmp parallel_pi.c

To submit the OpenMP job use:

    sqsub -t -r 1h --mpp=1.0G  -o parallel_pi.log -f threaded -n 4 ./parallel_pi.exe


For the fortran OpenMP program, compile with command:

(A) for ifort (intel)

    $FC -o parallel_do_loop_exec -openmp   parallel_do_loop.f90

(B) for gfortran (GNU)

    $FC -o parallel_do_loop_exec -fopenmp   parallel_do_loop.f90


and submit the jobs with command:

    sqsub -t -r 1h --mpp=2.0G -o parallel_do_loop.log -f threaded -N 1 -n 8 \
          ./parallel_do_loop_exec

SHARCNET SMP systems

SMP mechines

SHARCNET has a few SMP machines: silky, bramble, school and prism. prism is suitable for shared memory visualization applications, while silky and bramble are suitable for large shared memory applications. The basic system information are listed below:

System Cpus Memory (GB)
silky 128 256
bramble 64 128
school 8 16
prism 4 8
iqaluk 32 1024

Besides, SHARCNET visualization workstations are all multi-core machines (mostly 4 cores with 16GB memory).

SMP nodes on a cluster

While the core SHARCNET systems are clusters, they all come with multi-core shared memory nodes which allow for SMP-based parallel programming such as OpenMP applications. However, the maximum size of a OpenMP job to run on a cluster is subject to the max. cpus per node. For reference, we list the node info for some systems below:

System Node type CPUs/node Memory/node(GB) OMP_NUM_THREADS (max)
saw Xeon 8 16 8
hound Xeon 16, 32 128 16, 32
narwhal,bruce, bala, megaladon, zebra Opteron 4 8 4
bull nodes on kraken Opteron 4 32 4
whale Opteron 4 4 4
requin Opteron 2 8 2
angel Xeon 8 16 8
goblin Opteron, Xeon 4, 8, 16 8, 12, 48 4, 8, 16
redfin Opteron 24 96, 192 24
wobbie Opteron 2, 4, 32 4, 8, 32 2, 4, 32

OpenMP flag

The compiler flag for OpenMP code varies from compiler to compiler, the basic OpenMP compiler flags are:

Compiler c/c++ fortran flag
Intel icc ifort -openmp
PGI pgcc/pgCC pgf77/pgf90 -mp
Pathscale pathcc/pathCC pathf77/pathf90 -openmp
GNU gcc/g++ gfortran -fopenmp

Tutotials and References

Many OpenMP tutorials and examples can be found online, or from OpenMP forum website.