From Documentation
Jump to: navigation, search

Objective

The purpose of this guide is to detail the steps required to submit and run parallel computing MATLAB jobs on SHARCNET clusters using the desktop client installed on your local machine.

Please note that users must already possess a license for MATLAB and the Parallel Computing Toolbox in order to run parallel MATLAB jobs on SHARCNET. Users must also request to be added to the 'matlab' group. The other important notes are:

  • MATLAB PCT is not intended to run regular sequential MATLAB code. User have to implement the code using PCT in order to run it in parallel.
  • You have to make sure that your code is efficient to run in parallel using PCT, you can do a performance benchmark on your local machine (PC or whatever machine) as the PCT can run easily via 'matlabpool' on a local multicore machine. You can compare the runtimes (walltime) for 1-cpu/worker (the sequential code) and multiple cpus/workers such as 4 cpus (numprocs=4) and provide us the performance results.
  • We only grant access to those who can provide reasonable performance results.
  • Due to license seeds limitation, we advise user to run 4-cpu or up to 8-cpu (if performance allows) PCT jobs and a few jobs at a time on Sharcnet clusters. Otherwise, users may suffer from license checkout error if one or two users used most/all license seeds.

Because the SHARCNET license for MATLAB is restricted to compute nodes only, jobs are submitted directly from your own installation of MATLAB running on your local machine. Before jobs can be submitted, your MATLAB desktop client must be configured to uplink with the SHARCNET cluster. This involves manually modifying some of the configuration files. This also requires installing the SSH client program 'putty' if running Windows, or using the 'ssh' program on POSIX-based systems (Linux, Macintosh, etc).

We will go through the Installation and Configuration steps below, and then run the sample programs.

Install MATLAB on Client Host

You can purchase MATLAB and the Parallel Computing Toolbox (PCT) package and license from MathWorks and install them on your local PC with either Windows, Linux, or Mac OS. Follow the GUI instruction until the installation is finished. Some campus site licenses might include the parallel computing toolbox; please contact your local MATLAB support person for this matter.

If MATLAB has been installed to an alternative directory, the instructions below must be modified accordingly.

Instruction for R2014a version

Download and install files

After MATLAB R2014a is installed on your local machine, you can download 'matlab2014a-config.tar.gz' from

MATLAB_PCT_Scripts 

and add the files into $MATLABROOT/toolbox/local/, where $MATLABROOT stands for the MATLAB installation location on your local machine. This is a one time installation, you do not need to touch any files in the installation location afterward.

Follow the same Steps#2-4 for R2012a and R2012b posted below but change the version to R2014a

Instruction for R2012a and R2012b versions

Step #1. MATLAB R2012a Client Configuration (the same works for R2012b)

Download and install files

After MATLAB R2012a or R2012b is installed on your local machine, you can download 'matlab2012ab-config.tar.gz' from

MATLAB_PCT_Scripts 

and add the files into $MATLABROOT/toolbox/local/, where $MATLABROOT stands for the MATLAB installation location on your local machine. This is a one time installation, you do not need to touch any files in the installation location afterward.

Step #2. Prepare scripts (.m files) on your client machine

The following 3 run scripts are included in 'runscripts.tar.gz'. You can download 'runscripts.tar.gz' from

MATLAB_PCT_Scripts 

and put the files in your MATLAB PCT application code folder/directory on your local machine (for example, I have C:\Documents and Settings\JemmyHu\My Documents\MATLAB for all my .m files)

clusterInfo.m

clusterInfo.m is to set up cluster info and run time parameters. You can modify some parameters to fit in your job if necessary.

function [ cluster ] = clusterInfo()
% You can change the following 3 lines for cpus/workers, runtime (in second) and memory usage (in gb) to fit your job's needs.
% The maximum limits are 7 days, 8 cpus (recommend), nprocsx2 GB. 
% Please always use the appropriate settings to meet your job's needs, not the maximum limits if possible.
Nprocs = 4;
Wtime=172800;
Memory= 1;
additionalSubmitArgs = sprintf('-l ncpus=%d -l walltime=%d,cput=%d -l pvmem=%dgb', Nprocs,Wtime,Wtime*Nprocs,Memory);
%
% If you really need to use Nprocs>8 for a job (such as 16, not recommended though), specify the Nprocs=16 above,                                
% comment out the above "additionalSubmitArgs" line and use the following 3 lines instead 
% (in this example, cpus will be across 4 nodes, use 4 cpus per node, you can change 'cpuPerNode' as needed)
%cpuPerNode = 4;
%numOfNodes = ceil(Nprocs/cpuPerNode);
%additionalSubmitArgs = sprintf('-l nodes=%d:ppn=%d -l walltime=%d,cput=%d -l pvmem=%dgb',numOfNodes,cpuPerNode,Wtime,Wtime*numOfNodes*cpuPerNode,Memory);
%
% Specify a cluster environment and use a local folder as the JobStorageLocation 
cluster = parallel.cluster.Generic( 'JobStorageLocation', 'C:\Temp' );
%
% Define the additional inputs to the submit functions, change 'jemmyhu' to your Sharcnet username
clusterHost = 'orca.sharcnet.ca';
remoteJobStorageLocation = '/scratch/jemmyhu/matlab';
%
% Specify file system and MATLAB Root
set(cluster, 'HasSharedFilesystem', false);
set(cluster, 'ClusterMatlabRoot', '/opt/sharcnet/matlab/R2012a');
set(cluster, 'OperatingSystem', 'unix');
%
% The IndependentSubmitFcn must be a MATLAB cell array that includes the three additional inputs for serial tasks
%set(cluster, 'IndependentSubmitFcn', {@independentSubmitFcn, clusterHost, remoteJobStorageLocation, additionalSubmitArgs});
% If you want to run communicating jobs (including matlabpool), you must specify a CommunicatingSubmitFcn
set(cluster, 'CommunicatingSubmitFcn', {@communicatingSubmitFcn, clusterHost, remoteJobStorageLocation,additionalSubmitArgs});
set(cluster, 'GetJobStateFcn', @getJobStateFcn);
set(cluster, 'DeleteJobFcn', @deleteJobFcn);

pTest_12a.m

This is the job run script, make sure 'Nprocs' matches what you specified in the 'clusterInfo.m' script. It runs the .m code 'paralleltestfunction.m'

function pTest_12a()
%
cluster = clusterInfo
% Create a Job Scheduler object
% Nprocs here should be the same number specified in clusterInfo.m
Nprocs = 4;
pjob = createCommunicatingJob(cluster, 'Type','SPMD')
set(pjob,'NumWorkersRange',Nprocs)
%
% We need to include this file on the cluster
set(pjob, 'AttachedFiles', {'paralleltestfunction.m'})
%
% Create and submit the task, wait for results
t = createTask(pjob, @paralleltestfunction, 1, {})
%
% Submit and Wait for results
submit(pjob)
wait(pjob)
results = fetchOutputs(pjob)
y = results{1}

paralleltestfunction.m

function total_sum = paralleltestfunction
if labindex == 1
    % Send magic square to other labs
    A = labBroadcast(1,magic(numlabs))
else
    % Receive broadcast on other labs
    A = labBroadcast(1)
end
% Calculate sum of column identified by labindex for this lab
column_sum = sum(A(:,labindex))
% Calculate total sum by combining column sum from all labs
total_sum = gplus(column_sum)

Step #3. Create 'matlab' directory under your /scratch on the cluster

Login to the cluster (orca, hound) where MATLAB PCT is installed. In above example, the host cluster is 'orca'

clusterHost = 'orca.sharcnet.ca'

Go to your /scratch directory, and create a 'matlab' subdirectory

cd /scratch/yourID
mkdir matlab

Change 'remoteJobStorageLocation = /scratch/jemmyhu/matlab' in the example to be

remoteJobStorageLocation = /scratch/yourid/matlab

The remote data location is ready to host your files now.

Step #4. Run MATLAB R2012a

Open a MATLAB R2012a client session on your local machine and run pTest_12a.m

It will pop-up a window for you to enter your SHARCNET username, select 'No' in the pop-up window for identity, then enter your SHARCNET password in the password window, your job will be sent to the cluster. Login to the cluster (orca), type 'sqjobs' to see the job status.

Once your job has been finished successfully, results will be copied back to your PC, the results will also stay in the remote location on the cluster.

Public Key Authentication (not recommend)

The MATLAB Parallel Computing Toolbox uses the SSH (secure shell) network protocol to log in to remote cluster. SCP (secure copy) protocol is used for transferring files back and forth between you local machine and the remote cluster. As explained above, you have to enter username and password in the pop-up windows each time to run a job which should be fine for most PCT users. However if you need to run many PCT jobs constantly, it would be cumbersome. You can use public key authentication for auto-login to the remote cluster without username/password, if yes, then you can run MATLAB PCT without pop-up windows for username/password.

Key generation

For Linux and Mac systems, you can generate keys with 'ssh-keygen' (making sure that you do not use a pass phrase). Save/keep (default) the private key, such as 'id_rsa', in /home/youtID/.ssh/ on your local machine and transfer the public key (id_rsd.pub) to the /home/username/.ssh/authorized_keys file on the remote cluster (orca).

For Windows PC, you can use PuTTY (download and install the whole PuTTY packages including putty, puttygen, plink, pscp, psftp, pageant, putty.cnt). Use puttygen to generate keys (SSH-2 RSA or SSH-2 DSA) and save the keys (public and private) in the 'PuTTY' installation location (e.g.,C:\Program Files\Putty), then use puttygen (click 'Conversions') to convert the private key (import the private key) to the OpenSSH key (Export OpenSSH key) and save the OpenSSH key (e.g., neam it as id_rsd) in the same directory. Copy the public key to the remote cluster and add it into /home/username/.ssh/authorized_keys file on the remote cluster (orca).

Configure PuTTY for automatic login, you can see the PuTTY key generation and configuration Snapshots from 'PuTTY_Configuration.doc' in

MATLAB_PCT_Scripts

clusterInfo.m

This is the same clusterInfo.m posted above except for the additional 3 lines for the public key authentication

 function [ cluster ] = clusterInfo()
% You can change the following 3 lines for cpus/workers, runtime (in second) and memory usage (in gb) to fit your job's needs.
% The maximum limits are 7 days, 8 cpus (recommend), nprocsx2 GB. 
% Please always use the appropriate settings to meet your job's needs, not the maximum limits if possible.
Nprocs = 4;
Wtime=172800;
Memory= 1;
additionalSubmitArgs = sprintf('-l ncpus=%d -l walltime=%d,cput=%d -l mem=%dgb', Nprocs,Wtime,Wtime*Nprocs,Memory);
%
% If you really need to use Nprocs>8 for a job (such as 16, not recommended though), specify the Nprocs=16 above,                                
% comment out the above "additionalSubmitArgs" line and use the following 3 lines instead 
% (in this example, cpus will be across 4 nodes, 4 cpus per node, you can change 'cpuPerNode' as needed)
%cpuPerNode = 4;
%numOfNodes = ceil(Nprocs/cpuPerNode);
%additionalSubmitArgs = sprintf('-l nodes=%d:ppn=%d -l walltime=%d,cput=%d -l mem=%dgb',numOfNodes,cpuPerNode,Wtime,Wtime*numOfNodes*cpuPerNode,Memory);
%
% for public key authentication without pop-up windows for username/password entries,
% modify the first 2 lines to match your local settings (yourID, private key location on your local machine)  
% if you do not have the automatic access setting, then comment out these lines 
 setenv('MDCS_CLIENT_USERNAME', 'yourID'); 
% for Windows PC using Putty, the OpenSSH key id_rda can be in 'C:\Program Files\Putty\'
 setenv('MDCS_CLIENT_IDENTITY_FILENAME', 'C:\Program Files\Putty\id_rsa'); 
% for Linux client, the private key is normally under /home/yourID/.ssh/)
%setenv('MDCS_CLIENT_IDENTITY_FILENAME', '/home/yourID/.ssh/id_rsa');
 setenv('MDCS_CLIENT_IDENTITY_PASSPHRASE', 'yes'); 
%
% Specify a cluster environment and use a local folder as the JobStorageLocation
cluster = parallel.cluster.Generic( 'JobStorageLocation', 'C:\Temp' );
%
% Define the additional inputs to the submit functions, change 'jemmyhu' to your Sharcnet username
clusterHost = 'orca.sharcnet.ca';
remoteJobStorageLocation = '/scratch/jemmyhu/matlab';
%
% Specify file system and MATLAB Root
set(cluster, 'HasSharedFilesystem', false);
set(cluster, 'ClusterMatlabRoot', '/opt/sharcnet/matlab/R2012a');
set(cluster, 'OperatingSystem', 'unix');
%
% The IndependentSubmitFcn must be a MATLAB cell array that includes the three additional inputs for serial tasks
%set(cluster, 'IndependentSubmitFcn', {@independentSubmitFcn, clusterHost, remoteJobStorageLocation, additionalSubmitArgs});
% If you want to run communicating jobs (including matlabpool), you must specify a CommunicatingSubmitFcn
set(cluster, 'CommunicatingSubmitFcn', {@communicatingSubmitFcn, clusterHost, remoteJobStorageLocation,additionalSubmitArgs});
set(cluster, 'GetJobStateFcn', @getJobStateFcn);
set(cluster, 'DeleteJobFcn', @deleteJobFcn);

There is no changes to pTest_12a.m and paralleltestfunction.m. You can run the same pTest_12a.m in a MATLAB session, it should not ask for username and password.

Notes

  • The total number of workers on the server is licensed to be 96. Do not use large number of workers per job, 4 or 8 maybe the suitable number.
  • For best results, attempt to balance the workload between workers, benchmark tests are highly recommended.
  • To check that your job is still running, use the "sqjobs" command on the server.
  • For long job, if you cannot wait for its finish. Once the "submit" function has been executed (check 'sqjobs' on the cluster side to see whether the job is in the 'matlab' queue), it is safe to stop your MATLAB session from your computer. The job will run on the cluster without needing any input from the client. However, the output cannot be copied back to your PC automatically. You can copy the output back to your PC manually after job completion.