From Documentation
Jump to: navigation, search

INTERPRETERS vs COMPILERS

Nick Chepurniy, Ph.D.
e-mail: nickc@sharcnet.ca
HPC Consultant - SHARCNET http://www.sharcnet.ca
Compute Canada http://www.computecanada.org


Overview

We start by defining what interpreters and compilers are and give examples of both. Then two simple problems are done in OCTAVE (interpreter) and two compiler languages (c and fortran) to illustrate how these tools are used. A more realistic problem (Matrix Inversion) for different matrix sizes is done in OCTAVE and compared to LAPACK. Based on this problem conclusions are drawn about the use of interpreters and compilers.

Introduction

In this online-tutorial we start by describing briefly how compilers and interpreters work. The most important issue is: which should you use, and that depends on what kind of code you are developing.

Fortran, c/C++, Pascal and ALGOL are compilers.

A compiler analyses all the statements in a program and links the code (using libraries) to produce an executable. To run the program you submit the executable which would produce some output. To make changes in the program you must repeat these steps.

OCTAVE, MATLAB, MAPLE, R and APL are interpreters.

An interpreter translates one high-level language command into low-level machine instructions and executes that command at once. The interpreter takes one command at a time, however the user can place several commands into a script file and execute the commands in that file one at a time.

Simple Comparison

First let's illustrate how OCTAVE (an interpreter) works. The second example is done in OCTAVE(interpreter) and both c and fortran (compilers) to illustrate the two approaches. Finally, the third example (in the next segment), a more realistic one, illustrates and answers: which should you use (a compiler or an interpreter).

OCTAVE and MATLAB are very similar. They have a lot of similar functions but there are some that are different.

Here is an example in OCTAVE:

[nickc@nar316:~] octave
GNU Octave, version 3.2.2
octave:1> x=3.0;
octave:2> xsq=x*x
xsq =  9
octave:3> quit

[nickc@nar316:~]

The first line invokes the interpreter (OCTAVE) which responds on the next line with the name of the interpreter and version. Then the interpreter prints the prompt with the line number:

octave:1>

User types a command (assigning the value of 3.0 to variable x) followed by a semicolon (;) and then presses CR (Carriage Return). The effect of the semicolon at the end of the command instructs the interpreter not to print the answer (i.e. value of x in this case is not printed on the screen).

The next command computes variable xsq as the product of x by x. Since the command does not end with a semicolon the answer is printed on the next line.

To terminate the interaction session the user types the command: quit

Here is a second example using OCTAVE:

[nickc@nar316:~] octave
GNU Octave, version 3.2.2
octave:1> sum=0;
octave:2> N=1000
N =  1000
octave:3> for i=1:N
> sum=sum+i;
> endfor
octave:4> sum
sum =  500500
octave:5> my_sum=N*(N+1)/2
my_sum =  500500
octave:6> quit

[nickc@nar316:~]

In above example, the user sets sum=0 and N=1000 in the first 2 commands. The third command is a for-loop starting with the clause:

for i=1:N

The interpreter will read instructions until an endfor clause is detected.

In the body of the for loop we are adding the index variable i and not printing the answer. On line 4 we print the answer (sum of integers from 1 to N).

Line 5 verifies the answer with the analytical formula: my_sum=N*(N+1)/2

When the algorithms become more complex rather than typing each command at the OCATVE prompt you can place all the commands into a script file. An OCTAVE script file has the *.m extension (e.g. my_commands.m). To execute the commands in the script file type the name of the file without the extension (e.g. my_commands) into the OCATVE prompt.

Let's place the commands of the second example into a file called ex2.m as shown next:

# OCTAVE script file ex2.m

# Commands for example 2

sum=0;
N=1000
for i=1:N
sum=sum+i;
endfor
sum
my_sum=N*(N+1)/2

To execute the commands in the OCTAVE script file ex2.m do as follows:

[nickc@nar316:] octave
GNU Octave, version 3.2.2
octave:1> ex2
N =  1000
sum =  500500
my_sum =  500500
octave:2> quit

or you can run it in batch mode by submiting it as a job using:

#!/bin/bash

# script file: sub_job

if [ $# -ne 1 ]; then
  echo
  echo "Must have 1 argument specifying the OCTAVE file"
  echo
  echo "For example:  ./sub_job ex2.m "
  echo
  exit
fi

OCTAVE_FILE=$1

sqsub -t -r 15m -o OUTPUT_${OCTAVE_FILE%%.*}_%J octave ${OCTAVE_FILE}

For completeness here is the file sumN.c which solves the problem in the second example using the c compiler, pathcc:

#include <stdio.h>

/*
   This program prints the sum of integers from 1 to N
*/

#define N  10000

int
main(int argc, char **argv)
{
  int i, sum, sum_formula; 

  sum = 0;

  for (i=1 ; i <= N; i++)
    sum = sum + i ;

  printf("For N = %d the sum = %d\n",N,sum);

  sum_formula = N*(N+1)/2;

  printf("and using formula sum = %d\n",sum_formula);

  return 0;
}


and the equivalent fortran file sumN.f90:

! file sumN.f90

      program sumN

      integer, parameter :: N = 10000

      integer :: i, sum, sum_formula

      sum = 0

      do i=1,N
        sum = sum + i
      enddo

      write(6,1001) N,sum
 1001 format("For N = ",i8," the sum = ",i10)

      sum_formula = N*(N+1)/2

      write(6,1002) sum_formula
 1002 format("and using formula sum =    ",i10)

      end

To compile and execute the c program you would do these commands:

[nickc@nar316:/work/nickc/INTERPRETERS_vs_COMPILERS] pathcc sumN.c
[nickc@nar316:/work/nickc/INTERPRETERS_vs_COMPILERS] ./a.out
For N = 10000 the sum = 50005000
and using formula sum = 50005000

and for the fortran program you would do:

[nickc@nar316:/work/nickc/INTERPRETERS_vs_COMPILERS] pathf90  sumN.f90
[nickc@nar316:/work/nickc/INTERPRETERS_vs_COMPILERS] ./a.out
For N =    10000 the sum =   50005000
and using formula sum =      50005000

Matrix Inversion (Example 3)

The third example in this online tutorial defines a STRIDWAD matrix, stored in the array A, and a vector X described in:

https://www.sharcnet.ca/help/index.php/LAPACK_and_ScaLAPACK_Examples#Defining_the_STRIDWAD_matrix

Once the matrix A and vector X are defined, the matrix-vector product, A*X, is computed and stored in the vector RHS. Then, using A and RHS the linear system of equations is solved and the solution is saved in the vector XX which should be identical to X.

In preparation for our third example, let's introduce OCTAVE functions. You can define an OCTAVE function by typing it into a *.m file which starts with the word: "function". Here are the definitions for two functions:

# file: INIT_MY_MATRIX.m

  function A = INIT_MY_MATRIX (N)

      if (nargin != 1)
        usage ("A = INIT_MY_MATRIX (N)");
      endif

      NH  = N/2;
      NH1 = NH + 1;

#     Diagonal elements

      MD = 3*eye(N);

#     Upper diagonal

      ii=-1*ones(N-1,1);
      UD=diag(ii,1);

#     Lower diagonal

      LD=diag(ii,-1);

#     ANTI-DIAGONAL

      AD=fliplr(eye(N));

      A = UD + MD + LD + AD ;

      return

  endfunction
# file: INIT_MY_VECTOR.m

  function X = INIT_MY_VECTOR(N)

      if (nargin != 1)
        usage ("A = INIT_MY_MATRIX (N)");
      endif

      X = find(ones(N,1));

      return

  endfunction


Note that the name of the file (e.g. INIT_MY_VECTOR.m) without the *.m extension is identical to the function name inside the file.

Once these functions are defined (by writing them into the *.m files), you can use them either interactevly or submiting a batch job from the same subdirectory where these functions appear.

Here is a script file for a short version of example 3:


# file: short_ex3.m

# OCTAVE commands for example 3

t0 = clock ();

N=10
A = INIT_MY_MATRIX(N)
X = INIT_MY_VECTOR(N)
RHS = A*X
# Now use A and RHS to find new XX
XX = A \ RHS
elapsed_time = etime (clock (), t0)

[total, user, system] = cputime ();
total
user
system
N

To run it interactively you type:

octave
GNU Octave, version 3.2.2
octave:1> short_ex3
N =  10
A =
   3  -1   0   0   0   0   0   0   0   1
  -1   3  -1   0   0   0   0   0   1   0
   0  -1   3  -1   0   0   0   1   0   0
   0   0  -1   3  -1   0   1   0   0   0
   0   0   0  -1   3   0   0   0   0   0
   0   0   0   0   0   3  -1   0   0   0
   0   0   0   1   0  -1   3  -1   0   0
   0   0   1   0   0   0  -1   3  -1   0
   0   1   0   0   0   0   0  -1   3  -1
   1   0   0   0   0   0   0   0  -1   3
 
X =

    1
    2
    3
    4
    5
    6
    7
    8
    9
   10 

RHS =

   11
   11
   11
   11
   11
   11
   11
   11
   11
   22

XX =

    1.0000
    2.0000
    3.0000
    4.0000
    5.0000
    6.0000
    7.0000
    8.0000
    9.0000
   10.0000

elapsed_time =  0.0048676
total =  0.24096
user =  0.15298
system =  0.087986
N =  10
-- less (100%) (f)orward, (b)ack, (q)uitPress Control-C again to abort.

octave:2> quit


For a larger system of equations (N=12000) we use the script file long_ex3.m:

# file: long_ex3.m

# Commands for example 3

t0 = clock ();

N=12000
A = INIT_MY_MATRIX(N);
X = INIT_MY_VECTOR(N);
RHS = A*X;
# Now use A and RHS to find new XX
XX = A \ RHS
elapsed_time = etime (clock (), t0) 

[total, user, system] = cputime ();
total
user
system
N

and submit the job in batch mode using the script sub_job (described earlier) with argument long_ex3.m as follows:

[nickc@nar316:] ./sub_job long_ex3.m
submitted as jobid 1524992

The output file, OUTPUT_long_ex3_1524992, for the job was:

GNU Octave, version 3.2.2
N =  12000
XX =

   1.0000e-00
   2.0000e+00
   3.0000e+00
   ...
   1.1998e+04
   1.1999e+04
   1.2000e+04 

elapsed_time =  263.65
total =  263.69
user =  253.32
system =  10.373
N =  12000

------------------------------------------------------------
Subject: Job 1524992: <srun octave long_ex3.m> Done

Job <srun octave long_ex3.m> was submitted from host <nar316> by user  <nickc>.

------------------------------------------------------------
# LSBATCH: User input
srun octave long_ex3.m
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :    264.45 sec.
    Max Memory :      2528 MB
    Max Swap   :      3478 MB

Comparison of execution times

Following table compares OCTAVE and LAPACK execution times for the matrix inversion problem for different orders of the matrix (N):

   N          OCTAVE         LAPACK
order of        CPU            CPU
 matrix        time           time
              [ sec ]        [ sec ] 

  10000       169.12          344.93
  13000       314.98          594.70
  15000       448.33          853.51
  16000       541.83         1000.37
  17000         N/A          1183.27
  20000         N/A          1813.09
  30000         N/A          3599.94
  32000         N/A             N/A

Note:  N/A means the problem cannot be run because there is not enough memory.

Conclusions

Following conclusions can be drawn from the OCTAVE program in Example 3 in this online tutorial and the results presented in the

https://www.sharcnet.ca/help/index.php/LAPACK_and_ScaLAPACK_Examples

online tutorial:

(1) Developing a model with an interpreter is much easier because interpreters have many more 
    commands and the development is carried out interactively. This results in a much faster 
    development cycle.

(2) OCTAVE and other interpreters could run as fast as compiled code for  smaller models.
    As seen from the table in the previous segment OCTAVE in all cases runs  in about
    half the time of the LAPACK execution time.

(3) Compiled code however could run much larger models.

(4) Since mpi can be used with compiled codes this extends the size of the  models to even 
    a higher dimension. In this case LAPACK programs can be extended to ScaLAPACK. See:

    https://www.sharcnet.ca/help/index.php/LAPACK_and_ScaLAPACK_Examples#LAPACK_RESULTS

Answer to which should you use:

For smaller models an intepreter would be more appropriate. However, if the model exceeds the memory then you have no choice but to switch to a compiler.

For the development of new algorithms it is easier to do all the preliminary testing using an interpreter. If the memory requirements are met then there is no need to switch to a compiler.

But for much larger models the compiler with mpi is more adecuate.

Using OCTAVE we were able to invert matrices up to N=16,000. For higher dimensions LAPACK was able to go up to N=30,000 and for even higher models ScaLAPACK went up to N=250,000 (using 256 processors).

Last update: 12 Sept 2013