From Documentation
Jump to: navigation, search
This page is will be deleted pending it's creation on the CC wiki.
Note: Some of the information on this page is for our legacy systems only. The page is scheduled for an update to make it applicable to Graham.


MKL
Description: Intel Math Kernel Library
SHARCNET Package information: see MKL software page in web portal
Full list of SHARCNET supported software


Introduction

MKL (Intel's Math Kernel Library) is a computing math library of highly optimized, extensively threaded routines for applications that require maximum performance. Intel MKL provides comprehensive functionality support in these major areas of computation: BLAS (level 1, 2, and 3), LAPACK linear algebra routines, ScaLAPACK, BLACS, PBLAS, FFT.

Version Selection

Presently the version of MKL 10.3.9 is the default and its module (mkl/10.3.9) is loaded automatically when you login. Version 11.1.4 is also available (mkl/11.1.4)

Later versions of MKL are bundled with the Intel compiler. You can still load them separately (for use with different compiler) by loading modules: intel/mkl/15.0.3, intel/mkl/15.0.6, intel/mkl/16.0.3.

More details about module usage can be found here: https://www.sharcnet.ca/help/index.php/Configuring_your_software_environment_with_Modules

Job Submission

Jobs requiring MKL should have a flag in the compiling command indicating the required libraries. See the next section which illustrates this procedure.

Examples of Job Compilation

We are assuming that the following modules are currently loaded: mkl/10.3.9 intel/12.1.3

Fortran DGEMM example

Use following command to compile file test_dgemm.f90:

 $FC test_dgemm.f90 -L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lm -openmp

where

 ! file name = test_dgemm.f90
 
       program mainp1
       implicit none
       integer, parameter :: HEIGHT=4
       integer, parameter :: WIDTH=3
       integer, parameter :: K=1
       integer            :: i, j
       double precision   :: ColumnVector(HEIGHT,K)
       double precision   :: RowVector(K,WIDTH)
       double precision   :: Result(HEIGHT,WIDTH)
       double precision   :: ALPHA, BETA
       character*1        :: NoTrans
 
       ALPHA = 1.0e0
       BETA  = 0.0e0
 
       do i=1,HEIGHT
         ColumnVector(i,K) = i
       enddo
 
       do j=1,WIDTH
         RowVector(K,j) = j
       enddo
 
       call PrintMatrix(ColumnVector, HEIGHT,K)
       call PrintMatrix(RowVector, K, WIDTH)
 
  !    To do the calculation, we will use the BLAS function dgemm. 
  !    This function calculates:  C = ALPHA*A*B + BETA*C 
 
       NoTrans  =  'N'
 
       call dgemm(NoTrans,NoTrans,HEIGHT,WIDTH,1,ALPHA,        &
      &     ColumnVector,HEIGHT,RowVector,1,BETA,Result,HEIGHT)
 
       call PrintMatrix(Result, HEIGHT, WIDTH)
       stop
       end
 
        subroutine PrintMatrix(pMatrix,nRows,nCols)
        implicit none
        integer            :: i, j, nRows, nCols
        double precision   :: pMatrix(nRows,nCols)
 
        do i=1,nRows
          do j=1,nCols
            print *,i,j,pMatrix(i,j)
          enddo
        enddo
 
        print *," "
        return
        end

Fortran ZDOTC example

Use following command to compile:

 $FC fortran_zdotc.f90 -L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread

where

 ! file name = fortran_zdotc.f90
 
       program fortran_zdotc
       implicit none
 
       integer, parameter :: NN=5
       integer :: n, inca, incb
       integer :: i
 
       DOUBLE COMPLEX :: ZX(0:NN-1),ZY(0:NN-1),ZDPXY,ZDPYX,ZDOTC
       REAL*8 :: Di, Dn
 
       inca = 1
       incb = 1
 
       n  = NN
       Dn = DBLE(n)
 
       print *,""
 
       DO i=0,n-1
         Di = i
         ZX(i) = CMPLX(Di,2.0D0*Di)
         ZY(i) = CMPLX(Dn-Di,2.0D0*Di)
         write(6,1001) ZX(i),ZY(i)
  1001   format("(",f6.2,",",f6.2,")    (",f6.2,",",f6.2,")")
       END DO
 
       ZDPXY = ZDOTC(n,ZX,inca,ZY,incb)
       ZDPYX = ZDOTC(n,ZY,incb,ZX,inca)
 
  1002   format("(",f6.2,",",f6.2,")")
       print *,""
       print *,"<ZX,ZY>"
       write(6,1002) ZDPXY
 
       print *,""
       print *,"<ZY,ZX>"
       write(6,1002) ZDPYX
       print *,""
 
       print *,"Job completed successfully"
       print *,""
 
       end program fortran_zdotc


C ZDOTC example

Use following command to compile:

  $CC main_zdotc.c -L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread

where

 /* file name = main_zdotc.c */
 
  /* The following example illustrates a call from a C program to the 
   * complex BLAS Level 1 function zdotc(). This function computes 
   * the dot product of two double-precision complex vectors.        
 
     DOT_PRODUCT = <ZX,ZY> = SUM[i=0,i=n-1] { DCONJG(ZX(I)) * ZY(I) }
                                    ------------- * ----
     Note that <ZX,ZY> = DCONJG(<ZY,ZX>)
 
     See: http://www2.math.umd.edu/~hking/Hermitian.pdf
 
   * In this example, the complex dot product is returned in the structure c. 
   */
 
  #include "mkl.h"
  #define N 5 
 
  void zdotc();
 
  int main() {
    int n, inca = 1, incb = 1, i;
 
    int DEBUG=1;
 
  /*  typedef struct {...} MKL_Complex16;   defined in "mkl.h"    */
 
    MKL_Complex16 a[N], b[N], c, d;
    n = N;
 
    printf("\n");
 
    for ( i = 0; i < n; i++ ){
      a[i].real = (double)i;
      a[i].imag = (double)i * 2.0;
 
      b[i].real = (double)(n - i);
      b[i].imag = (double)i * 3.0;
 
      printf(" ( %6.2f, %6.2f) ( %6.2f, %6.2f) \n",a[i].real,a[i].imag,b[i].real,b[i].imag);
    }
 
    zdotc( &c, &n, a, &inca, b, &incb );
    zdotc( &d, &n, b, &incb, a, &inca ); 
 
    printf("\n");
    printf("The complex dot product a|b is: ( %6.2f, %6.2f) \n", c.real, c.imag );
    printf("The complex dot product b|a is: ( %6.2f, %6.2f) \n", d.real, d.imag );
    printf("\n");
    printf("Job completed successfully\n");
    printf("\n");
 
  }


General Notes

COMPILING WITH MKL - USING ONLINE LINK ADVISOR

More generally (on any SHARCNET cluster) the Intel Math Kernel Library Link Line Advisor http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ can be used to generate linker options for more complex linking situations with MKL. For instance, to determine the link arguments for the MKL version 10.3, Linux Operating System, Intel Compiler, Intel (R) 64 architecture, Static Linking, 32 bit Integers, sequential version of MKL, ScaLAPACK library, and OpenMPI, the MKL Link Line Advisor would return the following linking recomendation (the syntax below would go into your Makefile):

$(MKLROOT)/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_core.a $(MKLROOT)/lib/intel64/libmkl_sequential.a -Wl,--end-group $(MKLROOT)/lib/intel64/libmkl_blacs_openmpi_lp64.a -lpthread -lm

and the following recommendation for locating the include files

 -I$(MKLROOT)/include

The $(MKLROOT) will already be set for you by the MKL module which is currently loaded. The above case is the one applicable on most SHARCNET clusters.

Using The ILP64 Vs LP64 Variants Of MKL

You should use Intel MKL ilp64 in following cases.
1. If you are using huge data arrays (indexing exceeds 2^32-1)
2. If you enable FORTRAN code with the /4I8 compiler option

The ilp64 version of the MKL libraries defines integers as 64 bit. This implies codes should be compiled with -i8 OR internally be modfied to use the integer*8 type. Otherwise the standard lp64 version of MKL should be used which assumes integers are standard 32 bit.

Support for Third-Party Interfaces

o GMP Functions
The Intel MKL implementation of GMP arithmetic functions includes arbitrary precision arithmetic operations on integer numbers. The interfaces of such functions fully match the GNU Multiple Precision (GMP) Arithmetic Library. For specifications of these functions, please see this <a href='http://www.intel.com/software/products/mkl/docs/gnump/WebHelp/'>link</a>. If you currently use the GMP library, you need to modify INCLUDE statements in your programs to mkl_gmp.h.

FFTW Interface Support

Intel MKL provides interface wrappers for the 2.x and 3.x FFTW (www.fftw.org) superstructure are located in the same directory on all clusters. Using hound and version 11.0.083 of the intel compiler as an example, the wrappers and corresponding fftw wrapper header files are located in the following locations:

[roberpj@hnd50:/opt/sharcnet/intel/11.0.083/ifc/mkl/interfaces] ls
blas95  fftw2xc  fftw2x_cdft  fftw2xf  fftw3xc  fftw3xf  lapack95

[roberpj@hnd50:/opt/sharcnet/intel/11.0.083/ifc/mkl/include/fftw] ls
fftw3.f  fftw_f77.i  fftw_mpi.h      rfftw.h      rfftw_threads.h
fftw3.h  fftw.h      fftw_threads.h  rfftw_mpi.h

The wrappers can be used for calling the Intel ~equivilent~ MKL Fourier transform functions instead of FFTW for programs that currently use FFTW without changing the program source code. Referring to the online document <a href='http://www.intel.com/software/products/mkl/docs/fftw_mkl_user_notes_2.htm'>FFTW to Intel® Math Kernel Library Wrappers Technical User Notes</a> its mentions that "FFTW2MKL wrappers are delivered as the source code that must be compiled by the user to build the wrapper library." By popular demand these wrapper have been precompiled for immediate use and located in two directories for each intel module (at present 11.0.083 and 11.1.069) as follows:

[roberpj@hnd50:/opt/sharcnet/intel/11.0.083/mkl/lib/em64t/interfaces] tree
.
|-- ilp64
|   |-- libfftw2xc_intel.a
|   |-- libfftw2xf_intel.a
|   |-- libfftw3xc_intel.a
|   |-- libfftw3xf_intel.a
|   |-- libmkl_blas95.a
|   |-- libmkl_lapack95.a
|   |-- mkl77_lapack.mod
|   |-- mkl77_lapack1.mod
|   |-- mkl95_blas.mod
|   |-- mkl95_lapack.mod
|   `-- mkl95_precision.mod
`-- lp64
    |-- libfftw2xc_intel.a
    |-- libfftw2xf_intel.a
    |-- libfftw3xc_intel.a
    |-- libfftw3xf_intel.a
    |-- libmkl_blas95.a
    |-- libmkl_lapack95.a
    |-- mkl77_lapack.mod
    |-- mkl77_lapack1.mod
    |-- mkl95_blas.mod
    |-- mkl95_lapack.mod
    `-- mkl95_precision.mod

Introduction to Using the MKL FFT

Intel markets two implementations of the FFT. The first being from <a href='http://software.intel.com/en-us/intel-mkl/'>MKL</a> and the other from <a href='http://software.intel.com/en-us/intel-ipp/'>IPP</a> whose differences are described <a href='http://software.intel.com/en-us/articles/mkl-ipp-choosing-an-fft/'>here</a>. Only the MKL version is installed on SHARCNET.

The main FFT Computation Functions provided with MKL are DftiComputeForward and DftiComputeForward which compute the forward and backward FFT respectively. These functions along with Descriptor Manipulation Functions, Descriptor Configuration Functions and Status Checking Functions are provided in the <a href='http://www.intel.com/software/products/mkl/docs/webhelp/fft/fft_DFTF.html'>Table “FFT Functions in Intel MKL”</a>. Intel describes howto use these functions in their <a href='http://www.intel.com/software/products/mkl/docs/webhelp/appendices/mkl_appC_FFT.html'>Fourier Transform Functions Code Examples</a> document which also covers multi-threading aspects.

The simplest way to explain howto MKL FFT is by compiling and running a example problem of which there are several located under /opt/sharcnet/intel/11.0.083/ifc/mkl/examples where the fortran samples are contained in the dftf sub-directory while the c program samples are contained in the dftc sub-directory. The problem demonstrated here is from the source complex_2d_double_ex1.f90 which provides a MKL DFTI interface example program (Fortran-interface) to demonstrate Forward-Backward 2D complex transform for double precision data inplace. Steps to run this program are as follows:

1) Copy the example directory to a test directory in your account with:

cp -r /opt/sharcnet/intel/11.0.083/ifc/mkl/examples/dftf  /scratch/myusername/dftfdemo
cd /scratch/myusername/dftfdemo

2) Next compile the example program. In this case the machine used is Silky ie) ia64 based.

make lib64 function=complex_2d_double_ex1 compiler=intel interface=ia64 [threading=parallel 2>&1 | tee myMake.out

3) The built output appears as follows, where you will note the first step is to compile mkl_dfti.f90 into a module which is then used in the program on line 42 where the statement Use MKL_DFTI can be seen vizzz:

make lib64 function=complex_2d_double_ex1 compiler=intel interface=ia64 [threading=parallel 2>&1 | tee myMake.out
rm -fr *.o *.mod
make mkl_dfti.o  dfti_example_support.o  dfti_example_status_print.o  complex_2d_double_ex1.res  _IA=64 EXT=a RES_EXT=lib
make[1]: Entering directory `/home/roberpj/samples/fft-intel/fft/dftf'
mkdir -p ./_results/intel_ia64_parallel_64_lib
ifort   -w -c /opt/sharcnet/intel/11.0.083/ifc/mkl/include/mkl_dfti.f90 -o mkl_dfti.o
mkdir -p ./_results/intel_ia64_parallel_64_lib
ifort   -w -c source/dfti_example_support.f90 -o dfti_example_support.o
mkdir -p ./_results/intel_ia64_parallel_64_lib
ifort   -w -c source/dfti_example_status_print.f90 -o dfti_example_status_print.o
mkdir -p ./_results/intel_ia64_parallel_64_lib
ifort   -w mkl_dfti.o  dfti_example_support.o  dfti_example_status_print.o  source/complex_2d_double_ex1.f90 -L"/opt/sharcnet/intel/11.0.083/ifc/mkl/lib/64"
"/opt/sharcnet/intel/11.0.083/ifc/mkl/lib/64"/libmkl_intel_lp64.a -Wl,--start-group "/opt/sharcnet/intel/11.0.083/ifc/mkl/lib/64"/libmkl_intel_thread.a
"/opt/sharcnet/intel/11.0.083/ifc/mkl/lib/64"/libmkl_core.a -Wl,--end-group -L"/opt/sharcnet/intel/11.0.083/ifc/mkl/lib/64" -liomp5 -lpthread -o
_results/intel_ia64_parallel_64_lib/complex_2d_double_ex1.out
export
LD_LIBRARY_PATH="/opt/sharcnet/intel/11.0.083/ifc/mkl/lib/64":/opt/sharcnet/lsf/6.2/linux2.6-glibc2.4-ia64/lib:/opt/sharcnet/lsf/6.2/linux2.6-glibc2.4-ia64/lib:/opt/sharcnet/intel/11$
_results/intel_ia64_parallel_64_lib/complex_2d_double_ex1.out <data/complex_2d_double_ex1.d >_results/intel_ia64_parallel_64_lib/complex_2d_double_ex1.res
make[1]: Leaving directory `/home/roberpj/samples/fft-intel/fft/dftf'

4) Since the program gets run automatically by the makefile, the output data can be examined by running more (or less) on the results file called complex_2d_double_ex1.res which gets created.

cat _results/intel_ia64_parallel_64_lib/complex_2d_double_ex1.res
 COMPLEX_2D_DOUBLE_EX1
 Forward-Backward 2D complex transform for double precision data
 
 Configuration parameters:
 
 DFTI_FORWARD_DOMAIN       = DFTI_COMPLEX
 DFTI_PRECISION            = DFTI_DOUBLE 
 DFTI_DIMENSION            =   2
 DFTI_LENGTHS              = {   5,   3}
 DFTI_PLACEMENT            = DFTI_INPLACE
 DFTI_INPUT_STRIDES        = {   0,   1,  15}
 DFTI_FORWARD_SCALE        = 1.0 
 DFTI_BACKWARD_SCALE       = 1.0/real(m*n)
 
 
 INPUT vector X (2D columns)
   (   0.729,   0.486)   (  -0.865,  -0.577)   (  -0.278,  -0.186)
   (   0.787,   0.525)   (   0.839,   0.559)   (  -0.586,  -0.391)
   (   0.122,   0.081)   (  -0.741,  -0.494)   (  -0.794,  -0.529)
   (  -0.655,  -0.437)   (   0.580,   0.387)   (  -0.866,  -0.577)
   (  -0.830,  -0.554)   (  -0.371,  -0.247)   (  -0.791,  -0.527)
 
 Compute DftiComputeForward
 
 Forward OUTPUT vector X (2D columns)
   (  -3.720,  -2.480)   (   3.681,  -0.995)   (   0.497,   3.780)
   (   2.932,  -1.810)   (   1.422,   0.044)   (   3.078,  -1.928)
   (   1.115,  -2.479)   (  -2.040,   1.814)   (   3.144,   1.228)
   (  -1.859,   1.982)   (   2.343,   2.430)   (   0.890,  -2.581)
   (  -0.543,   3.403)   (  -0.596,   3.583)   (   0.588,   1.295)
 
 Compute DftiComputeBackward
 
 Backward OUTPUT vector X (2D columns)
   (   0.729,   0.486)   (  -0.865,  -0.577)   (  -0.278,  -0.186)
   (   0.787,   0.525)   (   0.839,   0.559)   (  -0.586,  -0.391)
   (   0.122,   0.081)   (  -0.741,  -0.494)   (  -0.794,  -0.529)
   (  -0.655,  -0.437)   (   0.580,   0.387)   (  -0.866,  -0.577)
   (  -0.830,  -0.554)   (  -0.371,  -0.247)   (  -0.791,  -0.527)
 
 ACCURACY =    0.248253E-15
 TEST PASSED

Intel MKL Examples

The Intel compiler came with many mkl examples which can be copied to your work directory to experiment with by doing the following:

cp -r /opt/sharcnet/intel/current/ifc/mkl/examples /work/$USER

Then each example can be compiled by going into any example directory and executing:

make soem64t

References

o Intel Math Kernel Library Documentation (current release)
https://www.sharcnet.ca/Software/Intel/IntelIFC/mkl/mkl_documentation.htm

o Intel Math Kernel Library Documentation Home (latest release)
http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation/

o Version of Intel IPP, Intel MKL and Intel TBB Installed With The Intel® Compiler
http://software.intel.com/en-us/articles/which-version-of-ipp--mkl--tbb-is-installed-with-intel-compiler-professional-edition/

o Known Limitiation In Mkl 10.1 For Linux
http://software.intel.com/en-us/articles/known-limitations-in-mkl-101-for-linux/

o MKL - BLAS, CBLAS and LAPACK Compiling/Linking Functions &Fortran and C/C++ Calls
http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-blas-cblas-and-lapack-compilinglinking-functions-fortran-and-cc-calls/

o Using the ILP64 Interface vs. LP64 Interface
https://software.intel.com/en-us/node/528682

o Use of Intel MKL data types in C/C++ applications
http://software.intel.com/en-us/articles/use-of-intel-mkl-data-types-in-cc-applications

o Working with the Intel® Math Kernel Library Cluster Software
https://software.intel.com/en-us/node/528420