From Documentation
Jump to: navigation, search
Note: Some of the information on this page is for our legacy systems only. The page is scheduled for an update to make it applicable to Graham.

Using BLAS (LAPACK) routines with QD precision

Nick Chepurniy, Ph.D.
e-mail: nickc@sharcnet.ca
HPC Consultant - SHARCNET http://www.sharcnet.ca
Compute Canada http://www.computecanada.org

Overview

This document illustrates how to convert a Fortran program with standard precision (i.e. single, double or quadruple precision) data types to a 62-digits precision by using the QD package. The standard data types single, double and quadruple have 6, 15 and 33 digits precision, respectively, while the QD data type has 62-digits precision.


Files

All the files discussed in this document can be obtained by executing

       cp -r /work/nickc/QD_PROGRAMS/QD_BLAS .

which will result in a directory called QD_BLAS with three subdirectories containing source code, a build directory, a README file and the WIKI_DOC file (which is this document but in ordinary text format).

Original Programs

Following modules are required by this code and should be loaded before proceeding:

       module load intel/11.0.083
       module load acml/ifort/4.4.0

The procedure assumes that we have a Fortran program using standard data types. Let xtmm.f90 be the program which multiplies two matrices and prints the matrix product. The original xtmm.f90 file along with another three similar problems can be found in the subdirectory DGEMM_EXAMPLE.

From here onwards we will discuss only the xtmm.f90 program and note that the others follow exactly the same procedure and there is no need to discuss them. We start with subdirectory DGEMM_EXAMPLE where the original Fortran examples are located,

Following commands in the script file, cmp_tmm_acml, are used to compile the program:

# Filename = cmp_tmm_acml
 #!/bin/bash
 
 scriptname=$0
 
 function usage () {
 cat <<EOF
 
 This script must specify the precision to be  used for compiling the code.
 Therefore, please specify one and only one of  the following macros:
 
       USE_QUAD   USE_DOUBLE   USE_SINGLE
 
 Usage: $scriptname <precision-macro>
 
 For example:  $scriptname USE_DOUBLE
 
 EOF
    exit 0
 }
 
 # Script requires one argument:
 
 if (( $# != 1 )) ; then
   usage
   exit 1
 fi
 
 # Verifying for valid argument:
 
 KOUNT=0
 
 if [ $1 = "USE_QUAD" ] ; then
   KOUNT=$KOUNT+1
 fi
 
 if [ $1 = "USE_DOUBLE" ] ; then
   KOUNT=$KOUNT+1
 fi
 
 if [ $1 = "USE_SINGLE" ] ; then
   KOUNT=$KOUNT+1
 fi
 
 if (( $KOUNT != 1 )) ; then
   echo " $KOUNT "
   echo " "
   echo "       '$1' is an invalid argument !"
   usage
   exit 1
 fi
 
 export    INTELLIB=/opt/sharcnet/intel/11.0.083/ifc/lib/intel64
 export LIBS=-L${INTELLIB}
 export  ACML_LIB=/opt/sharcnet/acml/4.4.0/ifort-64bit/ifort64/lib
 
 $FC -fpp -D$1  xtmm.f90 util.f90 -L$ACML_LIB - lacml  $LIBS -lifcore -limf -lifport

Note that in addition to the xtmm.f90 file there is also file util.f90 which contains subroutine ARE_MACROS_MUTALLY_EXCLUSIVE called by the main program in the xtmm.f90 file.

The script expects one argument (macro) to compile the files with the selected data type.

Subroutine ARE_MACROS_MUTALLY_EXCLUSIVE verifies if one and only one data type was specified in the compile command (last line of the script) cmp_tmm_acml, listed above.

Here are the two fortran files:

 ! File xtmm.f90
       module set_precision
 
 !     Set the DEBUG_<subroutine_name> here:
 
       LOGICAL, parameter :: DEBUG_TEST_MATRIX_MULTIPLY = .true.
 
 !     Set precicion
 
 #ifdef USE_SINGLE
       integer, parameter  :: wp = kind(0.0)
 #else
 #ifdef USE_DOUBLE
       integer, parameter  :: wp = kind(0.0d0)
 #else
 #ifdef USE_QUAD
       integer, parameter  :: wp = kind(0.0q0)
 #endif
 #endif
 #endif
 
 !     Define some constants
 
       real(wp), parameter :: zero  = 0.0_wp
       real(wp), parameter :: one   = 1.0_wp
       real(wp), parameter :: three = 3.0_wp
 
       end module set_precision
 
 ! Following  blas routines: daxpy, dcopy, dgemm, dsyr,  dsyrk, dtpsv, dtrsm
 ! have been modified and placed in file: my_mini_blas.f90
 !
 !   DAXPY  ==>  DY(I) = DY(I) + DA*DX(I)
 !   DCOPY  ==>  copies a vector, x, to a vector, y
 !   DGEMM  ==>  C := alpha*op( A )*op( B ) + beta*C
 !   DSYR   ==>  A := alpha*x*x**T + A,  (x**T == transpose)
 !   DSYRK  ==>  C := alpha*A*A**T + beta*C
 !   DTPSV  ==>  solves A*x = b or A**T*x = b
 !   DTRSM  ==>  solves  op( A )*X = alpha*B,   or   X*op(  A ) = alpha*B,
 
 ! -----------------------------------------------------------------------------
 
       PROGRAM TEST_MATRIX_MULTIPLY
       use set_precision
       IMPLICIT NONE
 
 
 ! for display use N = 18 or 6
 
       integer, parameter  :: N = 6
       real(wp) :: alpha
 
       real(wp) :: A(N,N), B(N,N), C(N,N), IOTA(N), JOTA(N), DA
       INTEGER I, J, INCX, INCY
       LOGICAL :: DEBUG
 
 ! end of declarations
 
       DEBUG = DEBUG_TEST_MATRIX_MULTIPLY
       alpha = one / three
 
 ! should put this in the module before zero, one ... are defines !
 
       call ARE_MACROS_MUTALLY_EXCLUSIVE
 
       if (DEBUG) then
         write(6,1001) precision(alpha), alpha
  1001   format("alpha has ",i3," digits precision: "/                  &
      &    "      ---------1---------2---------3---------4---------5-"/  &
      &    "1/3=",1Pe50.43)
       endif
 
       write(6,1002)
  1002 format(" ")
 
 !
 ! Using "My Sparce Matrix" from:
 !
 !  https://www.sharcnet.ca/help/index.php/Solving_Systems_of_Sparse_Linear_Equations#Sparse_Matrices
 !
 ! Columns of B will be set to col_no*x where x=[1 2 3 .... N]'
 !
 
 ! Build A:  (N must be even number)
 
       if (mod(N,2) .ne. 0) then
         write(6,2001) N
  2001   format("N must be even number, here N = ",i9)
         stop
       endif
 
       A = zero
       C = zero
 
       A(1,1) = three          ! first diagonal element
       A(N,N) = three          ! last  diagonal element
 
       do I=2,N
         A(I  ,I  ) = three    ! diagonal elements
         A(I-1,I  ) = -one     ! upper sub-diagonal elements
         A(I  ,I-1) = -one     ! lower sub-diagonal elements
       enddo
 
       do I=N,1,-1
         A(I,1+N-I) = A(I,1+N-I) + one    ! anti-diagonal elements
       enddo
 
       DO I=1,N
         IOTA(I) = I
       END DO
 
 
       DO J=1,N
 !!      DO I=1,N
 !!        B(I,J) = I*J
 !!      END DO
         DA   = J
         INCX = 1
         INCY = 1
         JOTA=0
         call DAXPY(N,DA,IOTA,INCX,JOTA,INCY)
         call DCOPY(N,JOTA,INCX,B(1,J),INCY)
       END DO
 
       print *,"A:"
       DO I=1,N
         write(6,1003) (A(I,J),J=1,N)
       END DO
 
       print *," "
 
       print *,"B:"
       DO I=1,N
         write(6,1003) (B(I,J),J=1,N)
       END DO
 
       print *," "
 
  1003 format(18f5.0)
  1004 format(6f16.12)
  1005 format(f56.52)
 
       CALL DGEMM ( 'N', 'N', N, N, N, alpha, A, N, B, N, zero, C, N )
 
       print *,"C:"
       DO I=1,N
         write(6,1003) (C(I,J),J=1,N)
       END DO
 
       print *," "
       print *,"First Row of C:"
       DO J=1,N
         write(6,1005) C(1,J)
       END DO
 
       END PROGRAM TEST_MATRIX_MULTIPLY
 
 ! File util.f90
       subroutine ARE_MACROS_MUTALLY_EXCLUSIVE
       IMPLICIT NONE
       integer :: count_macros
       character*10 :: type_of_precision_used
 
       count_macros = 0
 
 #ifdef USE_SINGLE
       count_macros = count_macros + 1
       type_of_precision_used="SINGLE"
 #endif
 
 #ifdef USE_DOUBLE
       count_macros = count_macros + 1
       type_of_precision_used="DOUBLE"
 #endif
 
 #ifdef USE_QUAD
       count_macros = count_macros + 1
       type_of_precision_used="QUAD"
 #endif
 
 
       if (count_macros .gt. 1) then
         write(6,1001) count_macros
  1001   format("There are ",i1," fpp macro definitions for  precision:"/ &
      &  "Cannot have more than 1 definition. The compiling  script has:")
 #ifdef USE_SINGLE
         write(6,1003)
  1003   format("-DUSE_SINGLE ",$)
 #endif
 #ifdef USE_DOUBLE
         write(6,1004)
  1004   format("-DUSE_DOUBLE ",$)
 #endif
 #ifdef USE_QUAD
         write(6,1005)
  1005   format("-DUSE_QUAD ",$)
 #endif
 
         write(6,1007)
  1007   format(" ")
         stop
        endif
 
 ! NOTE: if count_macros is 0 then compiler will generate errors !
 
       if (count_macros .eq. 1) then
         write(6,1002) type_of_precision_used
  1002   format(/"Precision used for this code: ",a6/)
       endif
 
       return
       end

To compile above files using DOUBLE precision, we simply invoke the script cmp_tmm_acml as follows:

./cmp_tmm_acml USE_DOUBLE

which produces the executable a.out, then to run the job we type:

./a.out

and get following output:

[nickc@bul130:/work/nickc/GEP_TASK2/DGEMM_EXAMPLE] ./a.out

Precision used for this code: DOUBLE

alpha has  15 digits precision:
      ---------1---------2---------3---------4---------5-
1/3= 3.3333333333333331482961625624739099290000000E-01

 A:
   3.  -1.   0.   0.   0.   1.
  -1.   3.  -1.   0.   1.   0.
   0.  -1.   3.   0.   0.   0.
   0.   0.   0.   3.  -1.   0.
   0.   1.   0.  -1.   3.  -1.
   1.   0.   0.   0.  -1.   3. 

 B:
   1.   2.   3.   4.   5.   6.
   2.   4.   6.   8.  10.  12.
   3.   6.   9.  12.  15.  18.
   4.   8.  12.  16.  20.  24.
   5.  10.  15.  20.  25.  30.
   6.  12.  18.  24.  30.  36.

 C:
   2.   5.   7.   9.  12.  14.
   2.   5.   7.   9.  12.  14.
   2.   5.   7.   9.  12.  14.
   2.   5.   7.   9.  12.  14.
   2.   5.   7.   9.  12.  14.
   5.   9.  14.  19.  23.  28.

 First Row of C:
 2.3333333333333334813630699500208720560000000000000000
 4.6666666666666669627261399000417441130000000000000000
 7.0000000000000000000000000000000000000000000000000000
 9.3333333333333339254522798000834882260000000000000000
11.6666666666666678509045596001669764500000000000000000
14.0000000000000000000000000000000000000000000000000000

This worked fine for the data type DOUBLE because in the main program xtmm.f90 we called

     call DAXPY(N,DA,IOTA,INCX,JOTA,INCY)
     call DCOPY(N,JOTA,INCX,B(1,J),INCY)
     CALL DGEMM ( 'N', 'N', N, N, N, alpha, A, N, B, N, zero, C, N )

and the script invoked the ACML library. To run with data type SINGLE above calls must be changed to single precision:

     call SAXPY(N,DA,IOTA,INCX,JOTA,INCY)
     call SCOPY(N,JOTA,INCX,B(1,J),INCY)
     CALL SGEMM ( 'N', 'N', N, N, N, alpha, A, N, B, N, zero, C, N )

The ACML library has BLAS routines for SINGLE and DOUBLE precision only, i.e. there is no QUAD precision subroutines.

To overcome this problem, we need to obtain the source routines for DAXPY, DCOPY and DGEMM (including XERBLA and LSAME) from netlib on the internet and make some minor changes. A Google search for "netlib dgemm" would list the sites from where we can download dgemm.f, etc ...

We should use a new script for compiling our program with the modified source files for DGEMM, XERBLA, LSAME, DAXPY and DCOPY. We copied script cmp_tmm_acml into cmp_tmm and modify the last line in cmp_tmm as follows:

$FC -fpp -D$1  xtmm.f90 util.f90 dgemm.f90  $LIBS -lifcore -limf -lifport

and delete the previous line (export ACML_LIB ...) since we no longer want to use this library.

The modified dgemm.f90 file contains routines for DGEMM, XERBLA, LSAME, DAXPY and DCOPY. All of these routines have the statement:

     use set_precision

which invokes the "module set_precision" defined in xtmm.f90, where the integer parameter wp is defined for the standard data types (SINGLE, DOUBLE and QUAD). Through this integer parameter wp we should be able to compile the program xtmm.f90 for SINGLE, DOUBLE and QUAD precisions.

Here is a listing of the modified dgemm.f90 file containing the routines for DGEMM, XERBLA, LSAME, DAXPY and DCOPY:

 ! File dgemm.f90
 
       SUBROUTINE DGEMM(TRANSA,TRANSB,M,N,K,ALPHA,A,LDA,B,LDB,BETA,C,LDC)
       use set_precision
 
       IMPLICIT NONE
 
 !     .. Scalar Arguments ..
       real(wp)       :: ALPHA,BETA
 
       INTEGER K,LDA,LDB,LDC,M,N
       CHARACTER TRANSA,TRANSB
 !     ..
 !     .. Array Arguments ..
       real(wp) :: A(LDA,*),B(LDB,*),C(LDC,*)
 !     ..
 !
 !  Purpose
 !  =======
 !
 !  DGEMM  performs one of the matrix-matrix operations
 !
 !     C := alpha*op( A )*op( B ) + beta*C,
 !
 !  where  op( X ) is one of
 !
 !     op( X ) = X   or   op( X ) = X**T,
 !
 !  alpha and beta are scalars, and A, B and C are matrices,  with op( A )
 !  an m by k matrix,  op( B )  a  k by n matrix and  C an m  by n matrix.
 !
 !  Arguments
 !  ==========
 !
 !  TRANSA - CHARACTER*1.
 !           On entry, TRANSA specifies the form of op( A )  to be used in
 !           the matrix multiplication as follows:
 !
 !              TRANSA = 'N' or 'n',  op( A ) = A.
 !
 !              TRANSA = 'T' or 't',  op( A ) = A**T.
 !
 !              TRANSA = 'C' or 'c',  op( A ) = A**T.
 !
 !           Unchanged on exit.
 !
 !
 !  TRANSB - CHARACTER*1.
 !           On entry, TRANSB specifies the form of op( B ) to be used in
 !           the matrix multiplication as follows:
 !
 !              TRANSB = 'N' or 'n',  op( B ) = B.
 !
 !              TRANSB = 'T' or 't',  op( B ) = B**T.
 !
 !              TRANSB = 'C' or 'c',  op( B ) = B**T.
 !
 !           Unchanged on exit.
 !
 !  M      - INTEGER.
 !           On entry,  M  specifies  the number  of rows   of the  matrix
 !           op( A )  and of the  matrix  C.  M  must  be at least  zero.
 !           Unchanged on exit.
 !
 !  N      - INTEGER.
 !           On entry,  N  specifies the number  of columns of the matrix
 !           op( B ) and the number of columns of the matrix C. N must be
 !           at least zero.
 !           Unchanged on exit.
 !
 !  K      - INTEGER.
 !           On entry,  K  specifies  the number of columns of the matrix
 !           op( A ) and the number of rows of the matrix op( B ). K must
 !           be at least  zero.
 !           Unchanged on exit.
 !
 !  ALPHA  - DOUBLE PRECISION.
 !           On entry, ALPHA specifies the scalar alpha.
 !           Unchanged on exit.
 !
 !  A      - DOUBLE PRECISION array of DIMENSION ( LDA, ka ), where ka is
 !           k  when  TRANSA = 'N' or 'n',  and is  m  otherwise.
 !           Before entry with  TRANSA = 'N' or 'n',  the leading  m by k
 !           part of the array  A  must contain the matrix  A,  otherwise
 !           the leading  k by m  part of the array  A  must contain  the
 !           matrix A.
 !           Unchanged on exit.
 !
 !  LDA    - INTEGER.
 !           On entry, LDA specifies the first dimension of A as declared
 !           in the calling (sub) program. When  TRANSA = 'N' or 'n' then
 !           LDA must be at least  max( 1, m ), otherwise  LDA must be at
 !           least  max( 1, k ).
 !           Unchanged on exit.
 !
 !  B      - DOUBLE PRECISION array of DIMENSION ( LDB, kb ), where kb is
 !           n  when  TRANSB = 'N' or 'n',  and is  k  otherwise.
 !           Before entry with  TRANSB = 'N' or 'n',  the leading  k by n
 !           part of the array  B  must contain the matrix  B,  otherwise
 !           the leading  n by k  part of the array  B  must contain  the
 !           matrix B.
 !           Unchanged on exit.
 !
 !  LDB    - INTEGER.
 !           On entry, LDB specifies the first dimension of B as declared
 !           in the calling (sub) program. When  TRANSB = 'N' or 'n' then
 !           LDB must be at least  max( 1, k ), otherwise  LDB must be at
 !           least  max( 1, n ).
 !           Unchanged on exit.
 !
 !  BETA   - DOUBLE PRECISION.
 !           On entry,  BETA  specifies the scalar  beta.   When  BETA  is
 !           supplied as zero then C need not be set on input.
 !           Unchanged on exit.
 !
 !  C      - DOUBLE PRECISION array of DIMENSION ( LDC, n ).
 !           Before entry, the leading  m by n  part of the array  C must
 !           contain the matrix  C,  except when  beta  is zero, in which
 !           case C need not be set on entry.
 !           On exit, the array  C  is overwritten by the  m by n  matrix
 !           ( alpha*op( A )*op( B ) + beta*C ).
 !
 !  LDC    - INTEGER.
 !           On entry, LDC specifies the first dimension of C as declared
 !           in  the  calling  (sub)  program.   LDC  must  be  at  least
 !           max( 1, m ).
 !           Unchanged on exit.
 !
 !  Further Details
 !  ===============
 !
 !  Level 3 Blas routine.
 !
 !  -- Written on 8-February-1989.
 !     Jack Dongarra, Argonne National Laboratory.
 !     Iain Duff, AERE Harwell.
 !     Jeremy Du Croz, Numerical Algorithms Group Ltd.
 !
 !   =====================================================================
 !
 !     .. External Functions ..
       LOGICAL LSAME
       EXTERNAL LSAME
 !     ..
 !     .. External Subroutines ..
       EXTERNAL XERBLA
 !     ..
 !     .. Intrinsic Functions ..
       INTRINSIC MAX
 !     ..
 !     .. Local Scalars ..
       real(wp)      :: TEMP
       INTEGER I,INFO,J,L,NCOLA,NROWA,NROWB
       LOGICAL NOTA,NOTB
 !     ..
 !     .. Parameters .. defined in module set_precision
 !!    real(wp) :: ONE,ZERO
 !!    PARAMETER (ONE=1.0D+0,ZERO=0.0D+0)
 !     ..
 !
 !     Set  NOTA  and  NOTB  as  true if  A  and  B  respectively are not
 !     transposed and set  NROWA, NCOLA and  NROWB  as the number of rows
 !     and  columns of  A  and the  number of  rows  of  B  respectively.
 !
       NOTA = LSAME(TRANSA,'N')
       NOTB = LSAME(TRANSB,'N')
       IF (NOTA) THEN
           NROWA = M
           NCOLA = K
       ELSE
           NROWA = K
           NCOLA = M
       END IF
       IF (NOTB) THEN
           NROWB = K
       ELSE
           NROWB = N
       END IF
 !
 !     Test the input parameters.
 !
       INFO = 0
       IF ((.NOT.NOTA) .AND. (.NOT.LSAME(TRANSA,'C')) .AND.               &
      &    (.NOT.LSAME(TRANSA,'T'))) THEN
           INFO = 1
       ELSE IF ((.NOT.NOTB) .AND. (.NOT.LSAME(TRANSB,'C')) .AND.         &
      &         (.NOT.LSAME(TRANSB,'T'))) THEN
           INFO = 2
       ELSE IF (M.LT.0) THEN
           INFO = 3
       ELSE IF (N.LT.0) THEN
           INFO = 4
       ELSE IF (K.LT.0) THEN
           INFO = 5
       ELSE IF (LDA.LT.MAX(1,NROWA)) THEN
            INFO = 8
       ELSE IF (LDB.LT.MAX(1,NROWB)) THEN
           INFO = 10
       ELSE IF (LDC.LT.MAX(1,M)) THEN
           INFO = 13
       END IF
       IF (INFO.NE.0) THEN
           CALL XERBLA('DGEMM ',INFO)
           RETURN
       END IF
 !
 !     Quick return if possible.
 !
       IF ((M.EQ.0) .OR. (N.EQ.0) .OR.                                   &
      &    (((ALPHA.EQ.ZERO).OR. (K.EQ.0)).AND. (BETA.EQ.ONE))) RETURN
 !
 !     And if  alpha.eq.zero.
 !
       IF (ALPHA.EQ.ZERO) THEN
           IF (BETA.EQ.ZERO) THEN
               DO 20 J = 1,N
                   DO 10 I = 1,M
                       C(I,J) = ZERO
    10             CONTINUE
    20         CONTINUE
           ELSE
               DO 40 J = 1,N
                   DO 30 I = 1,M
                       C(I,J) = BETA*C(I,J)
    30             CONTINUE
    40         CONTINUE
           END IF
           RETURN
       END IF
 !
 !     Start the operations.
 !
       IF (NOTB) THEN
           IF (NOTA) THEN
 !
 !           Form  C := alpha*A*B + beta*C.
 !
               DO 90 J = 1,N
                   IF (BETA.EQ.ZERO) THEN
                       DO 50 I = 1,M
                           C(I,J) = ZERO
    50                 CONTINUE
                   ELSE IF (BETA.NE.ONE) THEN
                       DO 60 I = 1,M
                           C(I,J) = BETA*C(I,J)
    60                 CONTINUE
                   END IF
                   DO 80 L = 1,K
                       IF (B(L,J).NE.ZERO) THEN
                           TEMP = ALPHA*B(L,J)
                           DO 70 I = 1,M
                               C(I,J) = C(I,J) + TEMP*A(I,L)
    70                     CONTINUE
                       END IF
    80             CONTINUE
    90         CONTINUE
           ELSE
 !
 !           Form  C := alpha*A**T*B + beta*C
 !
               DO 120 J = 1,N
                   DO 110 I = 1,M
                       TEMP = ZERO
                       DO 100 L = 1,K
                           TEMP = TEMP + A(L,I)*B(L,J)
   100                 CONTINUE
                       IF (BETA.EQ.ZERO) THEN
                           C(I,J) = ALPHA*TEMP
                       ELSE
                           C(I,J) = ALPHA*TEMP + BETA*C(I,J)
                       END IF
   110             CONTINUE
   120         CONTINUE
           END IF
       ELSE
           IF (NOTA) THEN
 !
 !           Form  C := alpha*A*B**T + beta*C
 !
               DO 170 J = 1,N
                   IF (BETA.EQ.ZERO) THEN
                       DO 130 I = 1,M
                           C(I,J) = ZERO
   130                 CONTINUE
                   ELSE IF (BETA.NE.ONE) THEN
                       DO 140 I = 1,M
                           C(I,J) = BETA*C(I,J)
   140                 CONTINUE
                   END IF
                   DO 160 L = 1,K
                       IF (B(J,L).NE.ZERO) THEN
                           TEMP = ALPHA*B(J,L)
                           DO 150 I = 1,M
                               C(I,J) = C(I,J) + TEMP*A(I,L)
   150                     CONTINUE
                       END IF
   160             CONTINUE
   170         CONTINUE
           ELSE
 !
 !           Form  C := alpha*A**T*B**T + beta*C
 !
               DO 200 J = 1,N
                   DO 190 I = 1,M
                       TEMP = ZERO
                       DO 180 L = 1,K
                           TEMP = TEMP + A(L,I)*B(J,L)
   180                 CONTINUE
                       IF (BETA.EQ.ZERO) THEN
                           C(I,J) = ALPHA*TEMP
                       ELSE
                           C(I,J) = ALPHA*TEMP + BETA*C(I,J)
                       END IF
   190             CONTINUE
   200         CONTINUE
           END IF
       END IF
 !
       RETURN
 !
 !     End of DGEMM .
 !
       END
 
       LOGICAL          FUNCTION LSAME( CA, CB )
       use set_precision
 !
 !  -- LAPACK auxiliary routine (version 3.2) --
 !     Univ. of Tennessee, Univ. of California Berkeley and NAG Ltd..
 !     November 2006
 !
 !     .. Scalar Arguments ..
       CHARACTER          CA, CB
 !     ..
 !
 !  Purpose
 !  =======
 !
 !  LSAME returns .TRUE. if CA is the same letter as CB  regardless of
 !  case.
 !
 !  Arguments
 !  =========
 !
 !  CA      (input) CHARACTER*1
 !  CB      (input) CHARACTER*1
 !          CA and CB specify the single characters to be compared.
 !
 ! =====================================================================
 !
 !     .. Intrinsic Functions ..
       INTRINSIC          ICHAR
 !     ..
 !     .. Local Scalars ..
       INTEGER            INTA, INTB, ZCODE
 !     ..
 !     .. Executable Statements ..
 !
 !     Test if the characters are equal
 !
       LSAME = CA.EQ.CB
       IF( LSAME )                                                        &
      &   RETURN
 !
 !     Now test for equivalence if both characters are alphabetic.
 !
       ZCODE = ICHAR( 'Z' )
 !
 !     Use 'Z' rather than 'A' so that ASCII can be detected on Prime
 !     machines, on which ICHAR returns a value with bit 8  set.
 !     ICHAR('A') on Prime machines returns 193 which is the  same as
 !     ICHAR('A') on an EBCDIC machine.
 !
       INTA = ICHAR( CA )
       INTB = ICHAR( CB )
 !
       IF( ZCODE.EQ.90 .OR. ZCODE.EQ.122 ) THEN
 !
 !        ASCII is assumed - ZCODE is the ASCII code of either lower or
 !        upper case 'Z'.
 !
          IF( INTA.GE.97 .AND. INTA.LE.122 ) INTA = INTA - 32
          IF( INTB.GE.97 .AND. INTB.LE.122 ) INTB = INTB - 32
 !
       ELSE IF( ZCODE.EQ.233 .OR. ZCODE.EQ.169 ) THEN
 !
 !        EBCDIC is assumed - ZCODE is the EBCDIC code of either lower or
 !        upper case 'Z'.
 !
          IF( INTA.GE.129 .AND. INTA.LE.137 .OR.                         &
      &       INTA.GE.145 .AND. INTA.LE.153 .OR.                         &
      &       INTA.GE.162 .AND. INTA.LE.169 ) INTA = INTA + 64
          IF( INTB.GE.129 .AND. INTB.LE.137 .OR.                         &
      &       INTB.GE.145 .AND. INTB.LE.153 .OR.                         &
      &       INTB.GE.162 .AND. INTB.LE.169 ) INTB = INTB + 64
 !
       ELSE IF( ZCODE.EQ.218 .OR. ZCODE.EQ.250 ) THEN
 !
 !        ASCII is assumed, on Prime machines - ZCODE is the ASCII code
 !        plus 128 of either lower or upper case 'Z'.
 !
          IF( INTA.GE.225 .AND. INTA.LE.250 ) INTA = INTA - 32
          IF( INTB.GE.225 .AND. INTB.LE.250 ) INTB = INTB - 32
       END IF
       LSAME = INTA.EQ.INTB
 !
 !     RETURN
 !
 !     End of LSAME
 !
       END
 
       SUBROUTINE XERBLA( SRNAME, INFO )
       use set_precision
       IMPLICIT NONE
 
 !
 !  -- LAPACK auxiliary routine (preliminary version) --
 !     Univ. of Tennessee, Univ. of California Berkeley and NAG Ltd..
 !     November 2006
 !
 !     .. Scalar Arguments ..
       CHARACTER*(*)      SRNAME
       INTEGER            INFO
 !     ..
 !
 !  Purpose
 !  =======
 !
 !  XERBLA  is an error handler for the LAPACK routines.
 !  It is called by an LAPACK routine if an input parameter has an
 !  invalid value.  A message is printed and execution stops.
 !
 !  Installers may consider modifying the STOP statement in order to
 !  call system-specific exception-handling facilities.
 !
 !  Arguments
 !  =========
 !
 !  SRNAME  (input) CHARACTER*(*)
 !          The name of the routine which called XERBLA.
 !
 !  INFO    (input) INTEGER
 !          The position of the invalid parameter in the parameter list
 !          of the calling routine.
 !
 ! =====================================================================
 !
 !     .. Intrinsic Functions ..
       INTRINSIC          LEN_TRIM
 !     ..
 !     .. Executable Statements ..
 !
       WRITE( *, FMT = 9999 )SRNAME( 1:LEN_TRIM( SRNAME ) ), INFO
 !
       STOP
 !
  9999 FORMAT( ' ** On entry to ', A, ' parameter number ', I2, ' had ', &
      &      'an illegal value' )
 !
 !     End of XERBLA
 !
       END
 
       SUBROUTINE DCOPY(N,DX,INCX,DY,INCY)
       use set_precision
 
       IMPLICIT NONE
 
 !     .. Scalar Arguments ..
       INTEGER INCX,INCY,N
 !     ..
 !     .. Array Arguments ..
       real(wp),      DIMENSION(N) :: DX, DY
 !     ..
 !
 !  Purpose
 !  =======
 !
 !     DCOPY copies a vector, x, to a vector, y.
 !     uses unrolled loops for increments equal to one.
 !
 !  Further Details
 !  ===============
 !
 !     jack dongarra, linpack, 3/11/78.
 !     modified 12/3/93, array(1) declarations changed to array(*)
 !
 !  =====================================================================
 !
 !     .. Local Scalars ..
       INTEGER I,IX,IY,M,MP1
 !     ..
 !     .. Intrinsic Functions ..
       INTRINSIC MOD
 !     ..
       IF (N.LE.0) RETURN
       IF (INCX.EQ.1 .AND. INCY.EQ.1) THEN
 !
 !        code for both increments equal to 1
 !
 !        clean-up loop
 !
          M = MOD(N,7)
          IF (M.NE.0) THEN
             DO I = 1,M
                DY(I) = DX(I)
             END DO
             IF (N.LT.7) RETURN
          END IF
          MP1 = M + 1
          DO I = MP1,N,7
             DY(I) = DX(I)
             DY(I+1) = DX(I+1)
             DY(I+2) = DX(I+2)
             DY(I+3) = DX(I+3)
             DY(I+4) = DX(I+4)
             DY(I+5) = DX(I+5)
             DY(I+6) = DX(I+6)
          END DO
       ELSE
 !
 !        code for unequal increments or equal increments
 !          not equal to 1
 !
          IX = 1
          IY = 1
          IF (INCX.LT.0) IX = (-N+1)*INCX + 1
          IF (INCY.LT.0) IY = (-N+1)*INCY + 1
          DO I = 1,N
             DY(IY) = DX(IX)
             IX = IX + INCX
             IY = IY + INCY
          END DO
       END IF
       RETURN
       END
 
       SUBROUTINE DAXPY(N,DA,DX,INCX,DY,INCY)
       use set_precision
 
       IMPLICIT NONE
 
 !     .. Scalar Arguments ..
       real(wp)      :: DA 
 
       INTEGER INCX,INCY,N
 !     ..
 
 !     .. Array Arguments ..
       real(wp),      DIMENSION(N) :: DX, DY
 !     ..
 !
 !  Purpose
 !  =======
 !
 !     DAXPY constant times a vector plus a vector.
 !     uses unrolled loops for increments equal to one.
 !
 !  Further Details
 !  ===============
 !
 !     jack dongarra, linpack, 3/11/78.
 !     modified 12/3/93, array(1) declarations changed to array(*)
 !
 !  =====================================================================
 !
 !     .. Local Scalars ..
       INTEGER I,IX,IY,M,MP1
 !     ..
 !     .. Intrinsic Functions ..
       INTRINSIC MOD
 !     ..
       IF (N.LE.0) RETURN
       IF (DA .EQ. zero ) RETURN
       IF (INCX.EQ.1 .AND. INCY.EQ.1) THEN
 !
 !        code for both increments equal to 1
 !
 !
 !        clean-up loop
 !
          M = MOD(N,4)
          IF (M.NE.0) THEN
             DO I = 1,M
               DY(I) = DY(I) + DA*DX(I)
             END DO
          END IF
          IF (N.LT.4) RETURN
          MP1 = M + 1
          DO I = MP1,N,4
             DY(I) = DY(I) + DA*DX(I)
             DY(I+1) = DY(I+1) + DA*DX(I+1)
             DY(I+2) = DY(I+2) + DA*DX(I+2)
             DY(I+3) = DY(I+3) + DA*DX(I+3)
          END DO
       ELSE
 !
 !        code for unequal increments or equal increments
 !          not equal to 1
 !
          IX = 1
          IY = 1
          IF (INCX.LT.0) IX = (-N+1)*INCX + 1
          IF (INCY.LT.0) IY = (-N+1)*INCY + 1
          DO I = 1,N
           DY(IY) = DY(IY) + DA*DX(IX)
           IX = IX + INCX
           IY = IY + INCY
          END DO
       END IF
       RETURN
       END

Using the new script cmp_tmm we should be able to compile the code for QUAD precision using:

  ./cmp_tmm USE_QUAD

which upon execution produces following output:

! OUTPUT from ./a.out

Precision used for this code: QUAD

alpha has  33 digits precision:
      ---------1---------2---------3---------4---------5-
1/3= 3.3333333333333333333333333333333331730000000E-01

 A:
   3.  -1.   0.   0.   0.   1.
  -1.   3.  -1.   0.   1.   0.
   0.  -1.   3.   0.   0.   0.
   0.   0.   0.   3.  -1.   0.
   0.   1.   0.  -1.   3.  -1.
   1.   0.   0.   0.  -1.   3.

 B:
   1.   2.   3.   4.   5.   6.
   2.   4.   6.   8.  10.  12.
   3.   6.   9.  12.  15.  18.
   4.   8.  12.  16.  20.  24.
   5.  10.  15.  20.  25.  30.
   6.  12.  18.  24.  30.  36.

 C:
   2.   5.   7.   9.  12.  14.
   2.   5.   7.   9.  12.  14.
   2.   5.   7.   9.  12.  14.
   2.   5.   7.   9.  12.  14.
   2.   5.   7.   9.  12.  14.
   5.   9.  14.  19.  23.  28.

 First Row of C:
  2.3333333333333333333333333333333334620000000000000000
  4.6666666666666666666666666666666669230000000000000000
  7.0000000000000000000000000000000000000000000000000000
  9.3333333333333333333333333333333338470000000000000000
 11.6666666666666666666666666666666676900000000000000000
 14.0000000000000000000000000000000000000000000000000000

So far we have our program running for the three standard data types and the next step is to run the code for QD precision.

Modified QD Programs

Now we move to subdirectory DGEMM_QD and make further changes to our files.

QD requires the QD libraries and it must be launched from a C++ main program. File cmain1.C contains the C++ main program. We must change our fortran main program in xtmm.f90 to a subroutine and have the C++ main program call this subroutine. Here is an example of the cmain1.C program:

 # File cmain1.C
 
 #include <iostream>
 #include <stdio.h>
 #include <stdlib.h>
 #include <math.h>
 #include <iostream>
 #include <fstream>
 
 using namespace std;
 extern "C" { void xtmm_(int &iargK, int &iarg0, int &iarg1,  int &iarg2,int &iarg3, char *iarg4); }
 int main(int argc, char* argv[]) {
   int iargK,iarg0,iarg1,iarg2,iarg3;
   long N,Larg1,Larg2;
   char* iarg4;
   int DEBUG=1;
 
   iargK=atoi(argv[1]);
   iarg0=atoi(argv[2]);
   iarg1=atoi(argv[3]);
   iarg2=atoi(argv[4]);
   iarg3=atoi(argv[5]);
   iarg4=    argv[6] ;
 
   if (DEBUG == 1 ) {
     ofstream myfile;
     myfile.open ("Log_file.txt");
     myfile << "argc = " << argc << endl;
     for (int i = 0; i < argc; i++)
       myfile << "argv[" << i << "] = " << argv[i] << endl;
 
 // print value pointed by pointer iarg4
    myfile << "*iarg4=" << *iarg4 << endl;
 
    Larg1=iarg1;
    Larg2=iarg2;
    N = Larg1*pow(10,Larg2);
    myfile << "N = " << N << endl;
    myfile.close();
 
    }
 
 //  iarg4 = 'a' for append to previous output file
 //          'r' remove previous output file
 
 // pass pointer iarg4
 
   xtmm_(iargK,iarg0,iarg1,iarg2,iarg3,iarg4);
 
 }

In above file the subroutine xtmm is declared with several arguments. These arguments are placed after the executable (of the C++ main program) and will be transmitted to the fortran subroutine in file xtmm.f90.

Instead of using the cmp_tmm script (from DGEMM_EXAMPLE) a more generalized script called dg will be used in this subdirectory (DGEMM_QD).

The dg script reads the arguments and invokes a make utility three times: First time to remove old object files and the executable, second time to compile the fortran routines and main C++ program, and third time (using the same parameters) to submit a job. The arguments determine the precision requested for the code, number of processors requested to run the job (because in the future you might want to use MPI or OpenMP), size of matrix and time estimate for the job. Here is the new script dg:

# Script file dg

 #!/bin/bash
 
 # file name = dg  (script)
 
 # This script requires at least five arguments - sixth argument is optional:
 # sixth argument = r ==> remove previous output file (default)
 #                = a ==> append output from this run to previous output file
 #   Usage:
 #
 #      ./dg   <KASE>  <NPROCS>  <lead_digit>  <npow>  <cpu_time_in_hours> [ <append_rm_switch> ]
 #
 #   GEP: Precision(S/D/Quad/QD)
 #
 #        PREC  SNGE DBLE  Quad   QD
 #     OMP\
 #      no      00   01     02    03
 #
 #  <NPROCS> = number of OMP processors
 #
 #  N=<lead_digit>*10^<npow>= problem number in DGEMM subdirectory
 
 function usage () {
    cat <<EOF
 
 Usage: $scriptname <KASE>  <NPROCS> <lead_digit>  <npow>   <cpu_time_in_hours> [ <append_rm_switch> ]
        The default for <append_rm_switch> is r
 
        <KASE> = USE_QUAD  |  USE_DOUBLE  |  USE_SINGLE  |   USE_QD
 
 EOF
    exit 0
 }
 
 if (( $# != 5  && $# != 6 )) ; then
   usage
   exit 1
 fi
 
 if (( $# == 5 )) ; then
   MY_IARG6=r
 else
   MY_IARG6=$6
 fi
 
 # Verifying for valid argument:
 
 KOUNT=0
 
 if [ $1 = "USE_QD" ] ||  [ $1 -eq 3 ] ; then
   KOUNT=$KOUNT+1
   KASE=3
 fi
 
 if [ $1 = "USE_QUAD" ] ||  [ $1 -eq 2 ] ; then
   KOUNT=$KOUNT+1
   KASE=2
 fi
 
 if [ $1 = "USE_DOUBLE" ] ||  [ $1 -eq 1 ]  ; then
   KOUNT=$KOUNT+1
   KASE=1
 fi
 
 if [ $1 = "USE_SINGLE" ]  || [ $1 -eq 0 ] ; then
   KOUNT=$KOUNT+1
   KASE=0
 fi
 
 if (( $KOUNT != 1 )) ; then
   echo " $KOUNT "
   echo " "
   echo "       '$1' is an invalid argument !"
   usage
   exit 1
 fi
 
 echo
 echo "<KASE>: $1"
 echo "<NPROCS>: $2"
 echo "<lead_digit>: $3"
 echo "<npow>: $4"
 echo "<cpu_time_in_hours>: $5"
 echo "<append_rm_switch>: $MY_IARG6"
 echo
 
 MY_NPROCS=$2
 
 MY_ARG4=0
 
 if (( $MY_NPROCS < 1 || $MY_NPROCS > 1 )); then
     MY_NPROCS=1
 fi
 
  (( KK=0 ))
 
 
 # SER / SNGL
 if (( $KASE == 0 )) ; then
   OMP_VAL=USE_SERIAL
   QD_VAL=USE_SINGLE
   if (( $MY_NPROCS > 1 )); then
     MY_NPROCS=1
   fi
   (( KK = KK + 1 ))
 fi
 
 # SER / DBLE
 if (( $KASE == 1 )) ; then
   OMP_VAL=USE_SERIAL
   QD_VAL=USE_DOUBLE
   if (( $MY_NPROCS > 1 )); then
     MY_NPROCS=1
   fi
   (( KK = KK + 1 ))
 fi
 
 
 # SER / Quad
 if (( $KASE == 2 )) ; then
   OMP_VAL=USE_SERIAL
   QD_VAL=USE_QUAD
   if (( $MY_NPROCS > 1 )); then
     MY_NPROCS=1
   fi
   (( KK = KK + 1 ))
 fi
 
 # SER / QD
 if (( $KASE == 3 )) ; then
   OMP_VAL=USE_SERIAL
   QD_VAL=USE_QD
   if (( $MY_NPROCS > 1 )); then
     MY_NPROCS=1
   fi
   (( KK = KK + 1 ))
 fi
 
 if (( KK == 0 )) ; then
   echo "INVALID 'KASE' value !!!"
   exit 1
 else
   echo "KASE=$1"
 fi
 
 
 echo "MY_NPROCS=$MY_NPROCS"
 echo "OMP_VAL=$OMP_VAL"
 echo "QD_VAL=$QD_VAL"
 echo " "
 
 # Remove old objects and executable
 
 make D_PROCTYPE=$OMP_VAL D_DATATYPE=$QD_VAL clean
 
 # Compile with given macros, sleep and submit job with  $IARGK $IARG0 $IARG1 $IARG2 $IARG3 $IARG4
 
 make D_PROCTYPE=$OMP_VAL D_DATATYPE=$QD_VAL
 sleep 2
 
 echo "submitting job from ../build"
 
 cd ../build
 IARGK=$1 IARG0=$MY_NPROCS  IARG1=$3  IARG2=$MY_ARG4   IARG3=$5  IARG4=$MY_IARG6  make -f ../DGEMM_QD/makefile \
      D_PROCTYPE=$OMP_VAL D_DATATYPE=$QD_VAL  brun
 cd ../DGEMM_QD

The makefile used by the make utility command looks like this:

 # File makefile
 
 #!/bin/bash
 SHELL := /bin/bash
 
 ERR_MSG_1:= - You must use the icc compiler for this make  command - See HIST file for details !
 
 # Find where we are
 
 TOPDIR := $(shell pwd)
 
 # Check environment - we want icc (INTEL) compilers
 ifneq ($(CC), icc)
       $(error error $(ERR_MSG_1))
 endif
 
 # Define some constants
 PREFIX:=USE_
 SPACE:=
 
 # Set default D_MACROS:
 
 D_PROCTYPE  := USE_SERIAL
 D_DATATYPE  := USE_DOUBLE
 
 OBJDIR  := OBJECTS
 
 # Compilers
 USEF90  := $(FC)
 USECXX  := $(CXX)
 
 # Libraries
 QDLIB=/opt/sharcnet/qd/2.3.13/intel/fortran
 INTELLIB=/opt/sharcnet/intel/12.1.3/ifc/lib/intel64
 ACML_LIB=/opt/sharcnet/acml/4.4.0/ifort-64bit/ifort64/lib - lacml
 ACML_INC=/opt/sharcnet/acml/4.4.0/ifort- 64bit/ifort64/include
 
 LIBS=-L${INTELLIB} -lifcore -limf -lifport -L$(ACML_LIB) -L/opt/sharcnet/qd/2.3.13/intel/lib
 
 # Flags
 IFLAGS   := -I$(QDLIB)
 FFLAGS   := -O2 -g  -fpp
 
 ifeq ($(D_DATATYPE),USE_QD)
   FFLAGS  += -DUSE_QD $(IFLAGS) -I$(ACML_INC)
   LIBS    += -L$(QDLIB) -lqdmod -lqd
 else
   FFLAGS  += -D$(D_DATATYPE) $(IFLAGS) -I$(ACML_INC)
 endif
 
  ifeq ($(D_PROCTYPE),USE_OMP)
   FFLAGS    += -openmp -DUSE_OMP $(IFLAGS) -I$(ACML_INC)
   LIBS      += -openmp
   ifeq ($(D_DATATYPE),USE_SINGLE)
     GEP_EXEC   = OMPexeSNGL
     OUT_FILE   = OMP_SNGL
   else
     ifeq ($(D_DATATYPE),USE_DOUBLE)
       GEP_EXEC   = OMPexeDBLE
       OUT_FILE   = OMP_DBLE
     else
       ifeq ($(D_DATATYPE),USE_QUAD)
         GEP_EXEC   = OMPexeQUAD
         OUT_FILE   = OMP_QUAD
       else
         ifeq ($(D_DATATYPE),USE_QD)
           GEP_EXEC   = OMPexeQD
           OUT_FILE   = OMP_QD
         else
           GEP_EXEC   = OMP_NOGO
         endif
       endif
     endif
   endif
 else
   LIBS      += -openmp -lacml
   ifeq ($(D_DATATYPE),USE_SINGLE)
     GEP_EXEC   = SERexeSNGL
     OUT_FILE   = SER_SNGL
   else
     ifeq ($(D_DATATYPE),USE_DOUBLE)
       GEP_EXEC   = SERexeDBLE
       OUT_FILE   = SER_DBLE
     else
       ifeq ($(D_DATATYPE),USE_QUAD)
         GEP_EXEC   = SERexeQUAD
         OUT_FILE   = SER_QUAD
       else
         ifeq ($(D_DATATYPE),USE_QD)
           GEP_EXEC   = SERexeQD
           OUT_FILE   = SER_QD
         else
           GEP_EXEC   = SER_NOGO
         endif
       endif
     endif
   endif
 endif
 
 SRCS:= xtmm.f90 util.f90 dgemm.f90
 
 OBJS:= $(addprefix $(OBJDIR)/,$(SRCS:.f90=.o))
 
 DATATYPE := $(subst $(PREFIX),$(SPACE),$(D_DATATYPE))
 
 TARGET := $(GEP_EXEC)
 
 all: $(TARGET)
         @echo " "
         @echo "This command executed from subdirectory: "
         @echo "     $(TOPDIR) "
         @echo " "
         @echo "Running on cluster: ${CLUSTER}"
         @echo "Compiling with USEF90 = $(USEF90) and FFLAGS = ${FFLAGS}"
         @echo " "
 
 $(TARGET): $(OBJS) makefile
         @echo " "
         @echo "Linking ... "
         @echo " "
         $(USECXX) -o $@ cmain1.C $(OBJS) $(LIBS)
 
 $(OBJDIR)/%.o : %.f90 makefile
         @echo " "
         @echo "Compiling ... "
         @echo " "
         $(USEF90) ${FFLAGS} -c -o $@ $<
 
 .PHONY: clean
 
 clean:
         rm -rf *.mod  *.o $(OBJDIR) $(TARGET)
 
 run: $(TARGET) brun
         @echo " "
         @echo "Job was submitted to the queue ... "
         @echo " "
 
 brun:
 ifeq ($(D_PROCTYPE),USE_OMP)
  ifeq ($(D_DATATYPE),USE_QD)
         @echo "KASE: OpenMP $(DATATYPE)"
         sqsub -r $(IARG3)h --mpp=2G -f threaded -N 1 -n  $(IARG0) -o ${CLU}_$(OUT_FILE)_$(IARG0)_%J   ../DGEMM_QD/$(TARGET) $(IARGK)  $(IARG0)  $(IARG1)  $(IARG2)   $(IARG3) $(IARG4)
  else
         @echo "KASE: OpenMP $(DATATYPE)"
         sqsub -r $(IARG3)h --mpp=4G -f threaded -N 1 -n  $(IARG0) -o ${CLU}_$(OUT_FILE)_$(IARG0)_%J   ../DGEMM_QD/$(TARGET) $(IARGK)  $(IARG0)  $(IARG1)  $(IARG2)   $(IARG3) $(IARG4)
  endif
 
 else
  ifeq ($(D_DATATYPE),USE_QD)
         @echo "KASE: SERIAL $(DATATYPE)"
         sqsub -r $(IARG3)h --mpp=2G -n $(IARG0) -o  ${CLU}_$(OUT_FILE)_%J  ../DGEMM_QD/$(TARGET) $(IARGK)   $(IARG0)  $(IARG1)  $(IARG2)  $(IARG3) $(IARG4)
  else
         @echo "KASE: SERIAL $(DATATYPE)"
         sqsub -r $(IARG3)h --mpp=1G -n $(IARG0) -o  ${CLU}_$(OUT_FILE)_%J  ../DGEMM_QD/$(TARGET) $(IARGK)   $(IARG0)  $(IARG1)  $(IARG2)  $(IARG3) $(IARG4)
  endif
 
 endif
 
 
 list:
         @echo " "
         @echo "List makefile macros: "
         @echo "This command executed from subdirectory: "
         @echo "     $(TOPDIR) "
         @echo "Running on cluster: ${CLUSTER}"
         @echo "Compiling with USEF90 = $(USEF90) and FFLAGS = ${FFLAGS}"
         @echo "Directories:"
         @echo "  FFLAGS  = $(FFLAGS) "
         @echo "  OBJS    = $(OBJS) "
         @echo "  OBJ    = $(OBJ) "
         @echo "  LIBS   = $(LIBS) "
 
 $(OBJS): | $(OBJDIR)
 
 $(OBJDIR):
         mkdir $(OBJDIR)

Note how the makefile instructs the fortran subroutines to be compiled and then the C++ compiler compiles the main C++ program in file cmain1.C and links the fortran objects to it.

To run the code for QD precision we require to load the module:

       module load acml/ifort/4.4.0

and we need the script dg and the makefile (listed above) and following source files: xtmm.f90, util.f90, dgemm.f90 and cmain1.C (which will be listed shortly).

Need to modify the makefile to find the latest version of qd (check in /opt/sharcnet/qd) , for example:

QDLIB=/opt/sharcnet/qd/2.3.13/intel/fortran
...
IFLAGS   := -I$(QDLIB)

Also modify the library locations:

INTELLIB=/opt/sharcnet/intel/12.1.3/ifc/lib/intel64
LIBS=-L${INTELLIB} -lifcore -limf -lifport -L$(ACML_LIB) -L/opt/sharcnet/qd/2.3.13/intel/lib


All these files are in subdirectory DGEMM_QD

To compile and submit our program with QD precision all you need to do is type:

       ./dg USE_QD  1  6  0  1

or

       ./dg    3    1  6  0  1

If you look carefully at the dg script you will notice that for the first argument of dg you can use an integer as follows:

       0 same as USE_SINGLE
       1 same as USE_DOUBLE
       2 same as USE_QUAD
       3 same as USE_QD

Here are the listings for xtmm.f90, util.f90, dgemm.f90 with additional fpp (Fortran PreProcessor) directives for the four types of data types: SINGLE, DOUBLE, QUAD and QD:

 ! File xtmm.f90
       module set_precision
 #ifdef USE_QD
       use qdmodule
 #endif
 
 !     Set the DEBUG_<subroutine_name> here:
 
       LOGICAL, parameter :: DEBUG_xtmm = .true.
 
 !     Set precision
 
       integer, parameter  :: sp = kind(0.0)
       integer, parameter  :: dp = kind(0.0d0)
       integer, parameter  :: qp = kind(0.0q0)
 
 #ifdef USE_SINGLE
       integer, parameter  :: wp = kind(0.0)
 #else
 #ifdef USE_DOUBLE
       integer, parameter  :: wp = kind(0.0d0)
 #else
 #ifdef USE_QUAD
       integer, parameter  :: wp = kind(0.0q0)
 #endif
 #endif
 #endif
 
 !     Define some constants
 
 #ifdef USE_QD
       type(qd_real),parameter :: ZERO  = QD_ZERO
       type(qd_real),parameter :: ONE   = QD_ONE
 !     type(qd_real),parameter :: TWO   = add_qd(ONE,ONE)
 !     type(qd_real),parameter :: HALF  = div_qd(ONE,TWO)
 !     type(qd_real),parameter :: THREE = add_qd(TWO,ONE)
       type(qd_real) :: TWO, HALF, THREE
 
 !! TWO, HALF, THREE must be define before they are used !
 
 #else
       real(wp), parameter :: zero  = 0.0_wp
       real(wp), parameter :: one   = 1.0_wp
       real(wp), parameter :: two   = 2.0_wp
       real(wp), parameter :: half  = 0.5_wp
       real(wp), parameter :: three = 3.0_wp
 #endif
 
       end module set_precision
 
 ! Following  blas routines: daxpy, dcopy, dgemm, dsyr,  dsyrk, dtpsv, dtrsm
 ! have been modified and placed in file: my_mini_blas.f90
 !
 !
 !   DAXPY  ==>  DY(I) = DY(I) + DA*DX(I)
 !   DCOPY  ==>  copies a vector, x, to a vector, y
 !   DGEMM  ==>  C := alpha*op( A )*op( B ) + beta*C
 !   DSYR   ==>  A := alpha*x*x**T + A,  (x**T == transpose)
 !   DSYRK  ==>  C := alpha*A*A**T + beta*C
 !   DTPSV  ==>  solves A*x = b or A**T*x = b
 !   DTRSM  ==>  solves  op( A )*X = alpha*B,   or   X*op( A  ) = alpha*B,
 
 ! -----------------------------------------------------------------------------
 
       subroutine xtmm(iargK,iarg0,iarg1,iarg2,iarg3,iarg4)
       use set_precision
 #ifdef USE_QD
       use qdmodule
 #endif
 
       IMPLICIT NONE
 
 ! for display use N = 18 or 6
 
       INTEGER        :: iargK, iarg0, NPROCS, status
       INTEGER        :: K_OMP, K_QD, KASE, N
       INTEGER        :: iarg1,iarg2,iarg3
       CHARACTER*1    :: iarg4
       INTEGER        :: I, J, INCX, INCY
       LOGICAL        :: DEBUG
 
 #ifdef USE_QD
       type(qd_real)                                :: alpha, DA
       type(qd_real), DIMENSION(:,:),  ALLOCATABLE  :: A, B, C
       type(qd_real), DIMENSION(:),    ALLOCATABLE  :: IOTA, JOTA
 !
       INTEGER :: old_cw
       call f_fpu_fix_start(old_cw)
       call assign_qd_str(TWO,"2.0")
       call assign_qd_str(HALF,"0.5")
       call assign_qd_str(THREE,"3.0")
 #else
       real(wp)                                :: alpha, DA
       real(wp), DIMENSION(:,:),  ALLOCATABLE  :: A, B, C
       real(wp), DIMENSION(:),    ALLOCATABLE  :: IOTA, JOTA
 #endif
 
 ! end of declarations
 
       N = iarg1
       DEBUG = DEBUG_xtmm
       alpha = one / three
 
       call ARE_MACROS_MUTALLY_EXCLUSIVE
 
       if (DEBUG) then
 #ifdef USE_QD
         write(6,1001) qd_precision(alpha)
  1001   format("alpha has ",i3," digits precision: "/                   &
      &"        ---------1---------2---------3---------4---------5-------&
      &--6---"/"1/3=",$)
         call qdwrite(6,alpha)
 #else
         write(6,1001) precision(alpha), alpha
  1001   format("alpha has ",i3," digits precision: "/                   &
      &    "      ---------1---------2---------3---------4---------5-"/  &
      &    "1/3=",1Pe50.43)
 #endif
       endif
 
       write(6,1002)
  1002 format(" ")
 
 !
 ! Using "My Sparce Matrix" from:
 !
 !  https://www.sharcnet.ca/help/index.php/Solving_Systems_of_Sp arse_Linear_Equations#Sparse_Matrices
 !
 ! Columns of B will be set to col_no*x where x=[1 2 3 .... N]'
 !
 
 ! Build A:  (N must be even number)
 
       if (mod(N,2) .ne. 0) then
         write(6,2001) N
  2001   format("N must be even number, here N = ",i9)
         stop
       endif
 
       ALLOCATE(A(N,N),stat=STATUS)
       if (STATUS .ne. 0) STOP "allocation error for A(N,N)"
 
       ALLOCATE(B(N,N),stat=STATUS)
       if (STATUS .ne. 0) STOP "allocation error for B(N,N)"
 
       ALLOCATE(C(N,N),stat=STATUS)
       if (STATUS .ne. 0) STOP "allocation error for C(N,N)"
 
       ALLOCATE(IOTA(N),stat=STATUS)
       if (STATUS .ne. 0) STOP "allocation error for IOTA(N)"
 
       ALLOCATE(JOTA(N),stat=STATUS)
       if (STATUS .ne. 0) STOP "allocation error for JOTA(N)"
 
       A = zero
       C = zero
 
       A(1,1) = three          ! first diagonal element
       A(N,N) = three          ! last  diagonal element
 
       do I=2,N
         A(I  ,I  ) = three    ! diagonal elements
         A(I-1,I  ) = -one     ! upper sub-diagonal elements
         A(I  ,I-1) = -one     ! lower sub-diagonal elements
       enddo
 
       do I=N,1,-1
         A(I,1+N-I) = A(I,1+N-I) + one    ! anti-diagonal elements
       enddo
 
       DO I=1,N
         IOTA(I) = I
       END DO
 
       DO J=1,N
 !!      DO I=1,N
 !!        B(I,J) = I*J
 !!      END DO
         DA   = J
         INCX = 1
         INCY = 1
         JOTA=0
         call DAXPY(N,DA,IOTA,INCX,JOTA,INCY)
         call DCOPY(N,JOTA,INCX,B(1,J),INCY)
       END DO
 !
       print *,"A:"
 #ifdef USE_QD
       DO I=1,N
         DO J=1,N
           write(6,1006) I,J
  1006     format("A(",i1,",",i1,") =",$)
           call qdwrite(6,A(I,J))
         ENDDO
       ENDDO
 #else
       DO I=1,N
         write(6,1003) (A(I,J),J=1,N)
       END DO
 #endif
 
       print *," "
 
       print *,"B:"
 #ifdef USE_QD
       DO I=1,N
         DO J=1,N
           write(6,1007) I,J
  1007     format("B(",i1,",",i1,") =",$)
           call qdwrite(6,B(I,J))
         ENDDO
       ENDDO
 #else
       DO I=1,N
         write(6,1003) (B(I,J),J=1,N)
       END DO
 #endif
       print *," "
 
  1003 format(18f5.0)
  1004 format(6f16.12)
  1005 format(f56.52)
 
       CALL DGEMM ( 'N', 'N', N, N, N, alpha, A, N, B, N, zero, C, N )
 
       print *,"C:"
 #ifdef USE_QD
       DO I=1,N
         DO J=1,N
           write(6,1009) I,J
  1009     format("C(",i1,",",i1,") =",$)
           call qdwrite(6,C(I,J))
         ENDDO
       ENDDO
 #else
       DO I=1,N
         write(6,1004) (C(I,J),J=1,N)
       END DO
 #endif
 
       print *," "
       print *,"First Row of C:"
       DO J=1,N
 #ifdef USE_QD
         write(6,1008) J
  1008   format("C(1,",",",i1,") =",$)
         call qdwrite(6,C(1,J))
 #else
         write(6,1005) C(1,J)
 #endif
       END DO
 
       END subroutine xtmm
 
 ! File util.f90
 
       subroutine ARE_MACROS_MUTALLY_EXCLUSIVE
       use set_precision
 #ifdef USE_QD
       use qdmodule
 #endif
 
       IMPLICIT NONE
       integer :: count_macros
       character*10 :: type_of_precision_used
 
       count_macros = 0
 
 #ifdef USE_SINGLE
       count_macros = count_macros + 1
       type_of_precision_used="SINGLE"
 #endif
 
 #ifdef USE_DOUBLE
       count_macros = count_macros + 1
       type_of_precision_used="DOUBLE"
 #endif
 
 #ifdef USE_QUAD
       count_macros = count_macros + 1
       type_of_precision_used="QUAD"
 #endif
 
 #ifdef USE_QD
       count_macros = count_macros + 1
       type_of_precision_used="QD"
 #endif
 
       if (count_macros .gt. 1) then
         write(6,1001) count_macros
  1001   format("There are ",i1," fpp macro definitions for precision:"/ &
      &  "Cannot have more than 1 definition. The compiling script has:")
 #ifdef USE_SINGLE
         write(6,1003)
  1003   format("-DUSE_SINGLE ",$)
 #endif
 #ifdef USE_DOUBLE
         write(6,1004)
  1004   format("-DUSE_DOUBLE ",$)
 #endif
 #ifdef USE_QUAD
         write(6,1005)
  1005   format("-DUSE_QUAD ",$)
 #endif
 #ifdef USE_QD
         write(6,1006)
  1006   format("-DUSE_QD ",$)
 #endif
         write(6,1007)
  1007   format(" ")
         stop
        endif
 
 ! NOTE: if count_macros is 0 then compiler will generate  errors !
 
       if (count_macros .eq. 1) then
         write(6,1002) type_of_precision_used
  1002   format(/"Precision used for this code: ",a6/)
       endif
 
       return
       end
 
       SUBROUTINE DCOPY(N,DX,INCX,DY,INCY)
       use set_precision
 #ifdef USE_QD
       use qdmodule
 #endif
 
       IMPLICIT NONE
 
 !     .. Scalar Arguments ..
       INTEGER INCX,INCY,N
 !     ..
 !     .. Array Arguments ..
 #ifdef USE_QD
       type(qd_real), DIMENSION(N) :: DX, DY
 #else
       real(wp),      DIMENSION(N) :: DX, DY
 #endif
 !     ..
 !
 !  Purpose
 !  =======
 !
 !     DCOPY copies a vector, x, to a vector, y.
 !     uses unrolled loops for increments equal to one.
 !
 !  Further Details
 !  ===============
 !
 !     jack dongarra, linpack, 3/11/78.
 !     modified 12/3/93, array(1) declarations changed to array(*)
 !
 !  =====================================================================
 !
 !     .. Local Scalars ..
       INTEGER I,IX,IY,M,MP1
 !     ..
 !     .. Intrinsic Functions ..
       INTRINSIC MOD
 !     ..
       IF (N.LE.0) RETURN
       IF (INCX.EQ.1 .AND. INCY.EQ.1) THEN
 !
 !        code for both increments equal to 1
 !
 !
 !        clean-up loop
 !
          M = MOD(N,7)
          IF (M.NE.0) THEN
             DO I = 1,M
                DY(I) = DX(I)
             END DO
             IF (N.LT.7) RETURN
          END IF
          MP1 = M + 1
          DO I = MP1,N,7
             DY(I) = DX(I)
             DY(I+1) = DX(I+1)
             DY(I+2) = DX(I+2)
             DY(I+3) = DX(I+3)
             DY(I+4) = DX(I+4)
             DY(I+5) = DX(I+5)
             DY(I+6) = DX(I+6)
          END DO
       ELSE
 !
 !        code for unequal increments or equal increments
 !          not equal to 1
 !
          IX = 1
          IY = 1
          IF (INCX.LT.0) IX = (-N+1)*INCX + 1
          IF (INCY.LT.0) IY = (-N+1)*INCY + 1
          DO I = 1,N
             DY(IY) = DX(IX)
             IX = IX + INCX
             IY = IY + INCY
          END DO
       END IF
       RETURN
       END
 
       SUBROUTINE DAXPY(N,DA,DX,INCX,DY,INCY)
       use set_precision
 #ifdef USE_QD
       use qdmodule
 #endif
 
       IMPLICIT NONE
 
 !     .. Scalar Arguments ..
 #ifdef USE_QD
       type(qd_real) :: DA
 #else
       real(wp)      :: DA
 #endif
 
       INTEGER INCX,INCY,N
 !     ..
 
 !     .. Array Arguments ..
 #ifdef USE_QD
       type(qd_real), DIMENSION(N) :: DX, DY
 #else
       real(wp),      DIMENSION(N) :: DX, DY
 #endif
 !     ..
 !
 !  Purpose
 !  =======
 !
 !     DAXPY constant times a vector plus a vector.
 !     uses unrolled loops for increments equal to one.
 !
 !  Further Details
 !  ===============
 !
 !     jack dongarra, linpack, 3/11/78.
 !     modified 12/3/93, array(1) declarations changed to array(*)
 !
 !  =====================================================================
 !
 !     .. Local Scalars ..
       INTEGER I,IX,IY,M,MP1
 !     ..
 !     .. Intrinsic Functions ..
       INTRINSIC MOD
 !     ..
       IF (N.LE.0) RETURN
       IF (DA .EQ. zero ) RETURN
       IF (INCX.EQ.1 .AND. INCY.EQ.1) THEN
 !
 !        code for both increments equal to 1
 !
 !
 !        clean-up loop
 !
          M = MOD(N,4)
          IF (M.NE.0) THEN
             DO I = 1,M
                DY(I) = DY(I) + DA*DX(I)
             END DO
          END IF
          IF (N.LT.4) RETURN
          MP1 = M + 1
          DO I = MP1,N,4
             DY(I) = DY(I) + DA*DX(I)
             DY(I+1) = DY(I+1) + DA*DX(I+1)
             DY(I+2) = DY(I+2) + DA*DX(I+2)
             DY(I+3) = DY(I+3) + DA*DX(I+3)
          END DO
       ELSE
 !
 !        code for unequal increments or equal increments
 !          not equal to 1
 !
          IX = 1
          IY = 1
          IF (INCX.LT.0) IX = (-N+1)*INCX + 1
          IF (INCY.LT.0) IY = (-N+1)*INCY + 1
          DO I = 1,N
           DY(IY) = DY(IY) + DA*DX(IX)
           IX = IX + INCX
           IY = IY + INCY
          END DO
       END IF
       RETURN
       END
 
 ! File dgemm.f90
 
       SUBROUTINE DGEMM(TRANSA,TRANSB,M,N,K,ALPHA,A,LDA,B,LDB,BETA,C,LDC)
       use set_precision
 #ifdef USE_QD
       use qdmodule
 #endif
 
       IMPLICIT NONE
 
 !     .. Scalar Arguments ..
 #ifdef USE_QD
       type(qd_real)  :: ALPHA,BETA
 #else
       real(wp)       :: ALPHA,BETA
 #endif
 
       INTEGER K,LDA,LDB,LDC,M,N
       CHARACTER TRANSA,TRANSB
 !     ..
 !     .. Array Arguments ..
 #ifdef USE_QD
       type(qd_real) :: A(LDA,*),B(LDB,*),C(LDC,*)
 #else
       real(wp) :: A(LDA,*),B(LDB,*),C(LDC,*)
 #endif
 !     ..
 !
 !  Purpose
 !  =======
 !
 !  DGEMM  performs one of the matrix-matrix operations
 !
 !     C := alpha*op( A )*op( B ) + beta*C,
 !
 !  where  op( X ) is one of
 !
 !     op( X ) = X   or   op( X ) = X**T,
 !
 !  alpha and beta are scalars, and A, B and C are matrices, with op( A )
 !  an m by k matrix,  op( B )  a  k by n matrix and  C an m by n matrix.
 !
 !  Arguments
 !  ==========
 !
 !  TRANSA - CHARACTER*1.
 !           On entry, TRANSA specifies the form of op( A ) to be used in
 !           the matrix multiplication as follows:
 !
 !              TRANSA = 'N' or 'n',  op( A ) = A.
 !
 !              TRANSA = 'T' or 't',  op( A ) = A**T.
 !
 !              TRANSA = 'C' or 'c',  op( A ) = A**T.
 !
 !           Unchanged on exit.
 !
 !  TRANSB - CHARACTER*1.
 
 
 !           On entry, TRANSB specifies the form of op( B ) to be used in
 !           the matrix multiplication as follows:
 !
 !              TRANSB = 'N' or 'n',  op( B ) = B.
 !
 !              TRANSB = 'T' or 't',  op( B ) = B**T.
 !
 !              TRANSB = 'C' or 'c',  op( B ) = B**T.
 !
 !           Unchanged on exit.
 !
 !  M      - INTEGER.
 !           On entry,  M  specifies  the number  of rows   of the  matrix
 !           op( A )  and of the  matrix  C.  M  must  be at least  zero.
 !           Unchanged on exit.
 !
 !  N      - INTEGER.
 !           On entry,  N  specifies the number  of columns of the matrix
 !           op( B ) and the number of columns of the matrix C. N must be
 !           at least zero.
 !           Unchanged on exit.
 !
 !  K      - INTEGER.
 !           On entry,  K  specifies  the number of columns of the matrix
 !           op( A ) and the number of rows of the matrix op( B ). K must
 !           be at least  zero.
 !           Unchanged on exit.
 !
 !  ALPHA  - DOUBLE PRECISION.
 !           On entry, ALPHA specifies the scalar alpha.
 !           Unchanged on exit.
 !
 !  A      - DOUBLE PRECISION array of DIMENSION ( LDA, ka ), where ka is
 !           k  when  TRANSA = 'N' or 'n',  and is  m  otherwise.
 !           Before entry with  TRANSA = 'N' or 'n',  the leading  m by k
 !           part of the array  A  must contain the matrix  A,  otherwise
 !           the leading  k by m  part of the array  A  must contain  the
 !           matrix A.
 !           Unchanged on exit.
 !
 !  LDA    - INTEGER.
 !           On entry, LDA specifies the first dimension of A as declared
 !           in the calling (sub) program. When  TRANSA = 'N' or 'n' then
 !           LDA must be at least  max( 1, m ), otherwise  LDA must be at
 !           least  max( 1, k ).
 !           Unchanged on exit.
 !
 !  B      - DOUBLE PRECISION array of DIMENSION ( LDB, kb ), where kb is
 !           n  when  TRANSB = 'N' or 'n',  and is  k  otherwise.
 !           Before entry with  TRANSB = 'N' or 'n',  the leading  k by n
 !           part of the array  B  must contain the matrix  B,  otherwise
 !           the leading  n by k  part of the array  B  must contain  the
 !           matrix B.
 !           Unchanged on exit.
 !
 !  LDB    - INTEGER.
 !           On entry, LDB specifies the first dimension of B as declared
 !           in the calling (sub) program. When  TRANSB = 'N' or 'n' then
 !           LDB must be at least  max( 1, k ), otherwise  LDB must be at
 !           least  max( 1, n ).
 !           Unchanged on exit.
 !
 !  BETA   - DOUBLE PRECISION.
 !           On entry,  BETA  specifies the scalar  beta.  When  BETA  is
 !           supplied as zero then C need not be set on input.
 !           Unchanged on exit.
 !
 !  C      - DOUBLE PRECISION array of DIMENSION ( LDC, n ).
 !           Before entry, the leading  m by n  part of the array  C must
 !           contain the matrix  C,  except when  beta  is zero, in which
 !           case C need not be set on entry.
 !           On exit, the array  C  is overwritten by the  m by n  matrix
 !           ( alpha*op( A )*op( B ) + beta*C ).
 !
 !  LDC    - INTEGER.
 !           On entry, LDC specifies the first dimension of C as declared
 !           in  the  calling  (sub)  program.   LDC  must  be  at  least
 !           max( 1, m ).
 !           Unchanged on exit.
 !
 !  Further Details
 !  ===============
 !
 !  Level 3 Blas routine.
 !
 !  -- Written on 8-February-1989.
 !     Jack Dongarra, Argonne National Laboratory.
 !     Iain Duff, AERE Harwell.
 !     Jeremy Du Croz, Numerical Algorithms Group Ltd.
 !     Sven Hammarling, Numerical Algorithms Group Ltd.
 !
 !  =====================================================================
 !
 !     .. External Functions ..
       LOGICAL LSAME
       EXTERNAL LSAME
 !     ..
 !     .. External Subroutines ..
       EXTERNAL XERBLA
 !     ..
 !     .. Intrinsic Functions ..
       INTRINSIC MAX
 !     ..
 !     .. Local Scalars ..
 #ifdef USE_QD
       type(qd_real) :: TEMP
 #else
       real(wp)      :: TEMP
 #endif
       INTEGER I,INFO,J,L,NCOLA,NROWA,NROWB
       LOGICAL NOTA,NOTB
 !     ..
 !     .. Parameters .. defined in module set_precision
 !!    real(wp) :: ONE,ZERO
 !!    PARAMETER (ONE=1.0D+0,ZERO=0.0D+0)
 !     ..
 !
 !     Set  NOTA  and  NOTB  as  true if  A  and  B   respectively are not 
 !     transposed and set  NROWA, NCOLA and  NROWB  as the number of rows
 !     and  columns of  A  and the  number of  rows  of  B  respectively.
 !
       NOTA = LSAME(TRANSA,'N')
       NOTB = LSAME(TRANSB,'N')
       IF (NOTA) THEN
           NROWA = M
           NCOLA = K
       ELSE
           NROWA = K
           NCOLA = M
       END IF
       IF (NOTB) THEN
           NROWB = K
       ELSE
           NROWB = N
       END IF
 !
 !     Test the input parameters.
 !
       INFO = 0
       IF ((.NOT.NOTA) .AND. (.NOT.LSAME(TRANSA,'C')) .AND.              &
      &    (.NOT.LSAME(TRANSA,'T'))) THEN
           INFO = 1
       ELSE IF ((.NOT.NOTB) .AND. (.NOT.LSAME(TRANSB,'C')) .AND.         &
      &         (.NOT.LSAME(TRANSB,'T'))) THEN
           INFO = 2
       ELSE IF (M.LT.0) THEN
           INFO = 3
       ELSE IF (N.LT.0) THEN
           INFO = 4
       ELSE IF (K.LT.0) THEN
           INFO = 5
       ELSE IF (LDA.LT.MAX(1,NROWA)) THEN
           INFO = 8
       ELSE IF (LDB.LT.MAX(1,NROWB)) THEN
           INFO = 10
       ELSE IF (LDC.LT.MAX(1,M)) THEN
           INFO = 13
       END IF
       IF (INFO.NE.0) THEN
           CALL XERBLA('DGEMM ',INFO)
           RETURN
       END IF
 !
 !     Quick return if possible.
 !
       IF ((M.EQ.0) .OR. (N.EQ.0) .OR.                                   &
      &    (((ALPHA.EQ.ZERO).OR. (K.EQ.0)).AND. (BETA.EQ.ONE))) RETURN
 !
 !     And if  alpha.eq.zero.
 !
 !
       IF (ALPHA.EQ.ZERO) THEN
           IF (BETA.EQ.ZERO) THEN
               DO 20 J = 1,N
                   DO 10 I = 1,M
                       C(I,J) = ZERO
    10             CONTINUE
    20         CONTINUE
           ELSE
               DO 40 J = 1,N
                   DO 30 I = 1,M
                       C(I,J) = BETA*C(I,J)
    30             CONTINUE
    40         CONTINUE
           END IF
           RETURN
       END IF
 !
 !     Start the operations.
 !
       IF (NOTB) THEN
           IF (NOTA) THEN
 !
 !           Form  C := alpha*A*B + beta*C.
 !
               DO 90 J = 1,N
                   IF (BETA.EQ.ZERO) THEN
                       DO 50 I = 1,M
                           C(I,J) = ZERO
    50                 CONTINUE
                   ELSE IF (BETA.NE.ONE) THEN
                       DO 60 I = 1,M
                           C(I,J) = BETA*C(I,J)
    60                 CONTINUE
                   END IF
                   DO 80 L = 1,K
                       IF (B(L,J).NE.ZERO) THEN
                           TEMP = ALPHA*B(L,J)
                           DO 70 I = 1,M
                               C(I,J) = C(I,J) + TEMP*A(I,L)
    70                     CONTINUE
                       END IF
    80             CONTINUE
    90         CONTINUE
           ELSE
 !
 !           Form  C := alpha*A**T*B + beta*C
 !
               DO 120 J = 1,N
                   DO 110 I = 1,M
                       TEMP = ZERO
                       DO 100 L = 1,K
                           TEMP = TEMP + A(L,I)*B(L,J)
   100                 CONTINUE
                       IF (BETA.EQ.ZERO) THEN
                           C(I,J) = ALPHA*TEMP
                       ELSE
                           C(I,J) = ALPHA*TEMP + BETA*C(I,J)
                       END IF
   110             CONTINUE
   120         CONTINUE
           END IF
       ELSE
           IF (NOTA) THEN
 !
 !           Form  C := alpha*A*B**T + beta*C
 !
               DO 170 J = 1,N
                   IF (BETA.EQ.ZERO) THEN
                       DO 130 I = 1,M
                           C(I,J) = ZERO
   130                 CONTINUE
                   ELSE IF (BETA.NE.ONE) THEN
                       DO 140 I = 1,M
                           C(I,J) = BETA*C(I,J)
   140                 CONTINUE
                   END IF
                   DO 160 L = 1,K
                       IF (B(J,L).NE.ZERO) THEN
                           TEMP = ALPHA*B(J,L)
                           DO 150 I = 1,M
                               C(I,J) = C(I,J) + TEMP*A(I,L)
   150                     CONTINUE
                       END IF
   160             CONTINUE
   170         CONTINUE
           ELSE
 !
 !           Form  C := alpha*A**T*B**T + beta*C
 !
               DO 200 J = 1,N
                   DO 190 I = 1,M
                       TEMP = ZERO
                       DO 180 L = 1,K
                           TEMP = TEMP + A(L,I)*B(J,L)
   180                 CONTINUE
                       IF (BETA.EQ.ZERO) THEN
                           C(I,J) = ALPHA*TEMP
                       ELSE
                           C(I,J) = ALPHA*TEMP + BETA*C(I,J)
                       END IF
   190             CONTINUE
   200         CONTINUE
           END IF
       END IF
 !
       RETURN
 !
 !     End of DGEMM .
 !
       END
 
       LOGICAL          FUNCTION LSAME( CA, CB )
       use set_precision
 #ifdef USE_QD
       use qdmodule
 #endif
 !
 !  -- LAPACK auxiliary routine (version 3.2) --
 !     Univ. of Tennessee, Univ. of California Berkeley and NAG Ltd..
 !     November 2006
 !
 !     .. Scalar Arguments ..
       CHARACTER          CA, CB
 !     ..
 !
 !  Purpose
 !  =======
 !
 !  LSAME returns .TRUE. if CA is the same letter as CB regardless of
 !  case.
 !
 !  Arguments
 !  =========
 !
 !  CA      (input) CHARACTER*1
 !  CB      (input) CHARACTER*1
 !          CA and CB specify the single characters to be compared.
 !
 ! =====================================================================
 !
 !     .. Intrinsic Functions ..
       INTRINSIC          ICHAR
 !     ..
 !     .. Local Scalars ..
       INTEGER            INTA, INTB, ZCODE
 !     ..
 !     .. Executable Statements ..
 !
 !     Test if the characters are equal
 !
       LSAME = CA.EQ.CB
       IF( LSAME )                                                        &
      &   RETURN
 !
 !     Now test for equivalence if both characters are alphabetic.
 !
       ZCODE = ICHAR( 'Z' )
 !
 !     Use 'Z' rather than 'A' so that ASCII can be detected on Prime
 !     machines, on which ICHAR returns a value with bit 8 set.
 !     ICHAR('A') on Prime machines returns 193 which is the same as
 !     ICHAR('A') on an EBCDIC machine.
 !
       INTA = ICHAR( CA )
       INTB = ICHAR( CB )
 !
       IF( ZCODE.EQ.90 .OR. ZCODE.EQ.122 ) THEN
 !
 !        ASCII is assumed - ZCODE is the ASCII code of either lower or
 !        upper case 'Z'.
 !
          IF( INTA.GE.97 .AND. INTA.LE.122 ) INTA = INTA - 32
          IF( INTB.GE.97 .AND. INTB.LE.122 ) INTB = INTB - 32
 !
       ELSE IF( ZCODE.EQ.233 .OR. ZCODE.EQ.169 ) THEN
 !
 !        EBCDIC is assumed - ZCODE is the EBCDIC code of either lower or
 !        upper case 'Z'.
 !
          IF( INTA.GE.129 .AND. INTA.LE.137 .OR.                         &
      &       INTA.GE.145 .AND. INTA.LE.153 .OR.                         &
      &       INTA.GE.162 .AND. INTA.LE.169 ) INTA = INTA + 64
          IF( INTB.GE.129 .AND. INTB.LE.137 .OR.                         &
      &       INTB.GE.145 .AND. INTB.LE.153 .OR.                         &
      &       INTB.GE.162 .AND. INTB.LE.169 ) INTB = INTB + 64
 !
       ELSE IF( ZCODE.EQ.218 .OR. ZCODE.EQ.250 ) THEN
 !
 !        ASCII is assumed, on Prime machines - ZCODE is the ASCII code
 !        plus 128 of either lower or upper case 'Z'.
 !
          IF( INTA.GE.225 .AND. INTA.LE.250 ) INTA = INTA - 32
          IF( INTB.GE.225 .AND. INTB.LE.250 ) INTB = INTB - 32
       END IF
       LSAME = INTA.EQ.INTB
 !
 !     RETURN
 !
 !     End of LSAME
 !
       END
 
       SUBROUTINE XERBLA( SRNAME, INFO )
       use set_precision
 #ifdef USE_QD
       use qdmodule
 #endif
       IMPLICIT NONE
 
 !
 !  -- LAPACK auxiliary routine (preliminary version) --
 !     Univ. of Tennessee, Univ. of California Berkeley and NAG Ltd..
 !     November 2006
 !
 !     .. Scalar Arguments ..
       CHARACTER*(*)      SRNAME
       INTEGER            INFO
 !     ..
 !
 !  Purpose
 !  =======
 !
 !  XERBLA  is an error handler for the LAPACK routines.
 !  It is called by an LAPACK routine if an input parameter has an
 !  invalid value.  A message is printed and execution stops.
 !
 !  Installers may consider modifying the STOP statement in order to
 !  call system-specific exception-handling facilities.
 !
 !  Arguments
 !  =========
 !
 !  SRNAME  (input) CHARACTER*(*)
 !          The name of the routine which called XERBLA.
 !
 !  INFO    (input) INTEGER
 !          The position of the invalid parameter in the parameter list
 !          of the calling routine.
 !
 ! =====================================================================
 !
 !     .. Intrinsic Functions ..
       INTRINSIC          LEN_TRIM
 !     ..
 !     .. Executable Statements ..
 !
       WRITE( *, FMT = 9999 )SRNAME( 1:LEN_TRIM( SRNAME ) ),  INFO 
 !
       STOP
 !
  9999 FORMAT( ' ** On entry to ', A, ' parameter number ',  I2, ' had ', &
      &      'an illegal value' )
 !
 !     End of XERBLA
 !
       END

You issue the ./dg command from subdirectory DGEMM_QD, and the compilation will be done in that subdirectory, but the dg script submits the job from subdirectory "build" so the output file will go to subdirectory "build". This is done so that subdirectory DGEMM_QD is not populated with the output files.