From Documentation
Jump to: navigation, search
(Using OpenCL)
Line 206: Line 206:
o NVIDIA OpenCL Website<br>
o NVIDIA OpenCL Website<br>" >" >

Revision as of 14:08, 9 November 2015

Description: Open Computing Language - framework for heterogeneous computing (GPGPU)
SHARCNET Package information: see OpenCL software page in web portal
Full list of SHARCNET supported software

OpenCL is the first open standard for writing programs that can execute across heterogeneous platforms, most importantly both CPUs and GPUs. OpenCL includes a language (based on C) for writing kernels (functions which can be executed on OpenCL devices), plus APIs to access and control the devices.

The best place to obtain authoritative information about OpenCL is the website of the Khronos Group consortium which maintains the OpenCL standard. AMD has also started a regular webinar series which may be of interest.

Attention visualization stations users: Not all of these machines have GPU cards that are OpenCL compatible. Please use only the systems listed above, which have been verified to run OpenCL correctly.

Note: OpenCL on machines with NVIDIA cards (angel and most viz stations) support OpenCL from NVIDIA. At this point it supports computation on GPGPU only, and will not detect CPUs on the node as OpenCL devices. The machines with ATI cards will support both GPU and CPU as devices.

Compiling OpenCL programs

Your OpenCL programs can be compiled on SHARCNET systems where OpenCL libraries are available. Please see the software page to see the up to date listing of systems on which OpenCL is available. To compile your code, follow these instructions for our various systems.

Compile as you would any other C code, but link to the OpenCL library. On some clusters you also need to specify path to the include files.

To compile on monk, using default CUDA 5.0:

gcc -o test.x code.c -lOpenCL -I/opt/sharcnet/cuda/5.0.35/toolkit/include

To compile angel:

gcc -o test.x code.c -lOpenCL -I/opt/sharcnet/cuda/5.0.35/toolkit/include

You need to determine which GPU card a viz machine has before you compile (ATI or NVIDIA). You can do it by looking at our list of systems.

To compile on viz machines with NVIDIA cards, use:

gcc -o test.x code.c -L/usr/lib64/nvidia -lOpenCL

To compile on viz machines with ATI cards, use:

gcc -o test.x code.c -I/opt/AMDAPP/include -L/opt/AMDAPP/lib/x86_64 -lOpenCL

NOTE: OpenCL will not detect the ATI GPU on some viz machines when you login remotely (eg. viz7-uwo). If you run into this problem, please find another viz machine with ATI card that works.

To execute the code that uses GPU, you can submit the job to the gpu queue on angel (the angel and monk login node has no GPU, so don't run there, even for testing)

sqsub -q gpu -r 1h -o code.out ./code.x

On the viz machines you can execute your code directly as there is no queue.

Compiling OpenCL examples from CUDA SDK on monk

Starting with CUDA 5.0, NVIDIA no longer ships OpenCL samples bundled with CUDA. They can still be downloaded from NVIDIA developer OpenCL page.

To run the OpenCL software development kit examples that come with CUDA 4.1, you need to first switch to that CUDA version with

module switch cuda/4.1

then you should copy its files to some location in your user space (here it will be /work/$USER/my_sdk_work but you can choose something else), following the steps below. Here $USER is an environment variable that is set to your username. You can replace $USER with your actual username in the commands below if you prefer.

cd /work/$USER/
mkdir my_sdk_work
cd my_sdk_work
cp -rp /opt/sharcnet/cuda/4.1/sdk/* .
cd OpenCL

This will create the binary executable files in


You can then experiment with changes in the source code, located in:


The examples you compiled will not run on the monk login node which does not have a GPU. They will run if you submit them as jobs to the gpu queue. You can also log into mon54 node of monk which has been set aside as a development node. To do that, do "ssh mon54" once you have logged into monk. It has two GPUs accessible to users in interactive mode, i.e. you don't have to use sqsub to run, and can just run executables that use the GPU from the command line.

Using OpenCL

OpenCL is available on SHARCNET on the clusters angel and "monk". Please consult the OpenCL page on our main portal for the latest version information. It is also installed on the visualization workstations. In particular, machines viz7-uwo, viz9-uwo, viz10-uwo and viz11-uwo have the latest OpenCL 1.1 installed.

Mac OS X 10.6 Snow Leopard has built in support for OpenCL (for both GPUs and CPUs), so it is a good development platform for OpenCL programs (consult OpenCL Programming Guide for Mac OS X).

Determining which OpenCL devices are available

Since OpenCL is designed to run on many platforms, it is particularly important for an OpenCL program to determine the characteristics of the hardware it is running on. OpenCL standard provides a rich set of routines which can provide detailed information about the capabilities of the system and the OpenCL devices available.

Below is an example program which lists some system information and devices available. Feel free to use it to determine the capabilities of the system you are on.

To provide useful information it should be run on a compute node which has OpenCL devices (GPUs) connected.

#include <stdio.h>
#include <CL/cl.h>
int main(int argc, char** argv) {
   char dname[500];
   cl_device_id devices[10];
   cl_uint num_devices,entries;
   cl_ulong long_entries;
   int d;
   cl_int err;
   cl_platform_id platform_id = NULL;
   size_t p_size;
/* obtain list of platforms available */
   err = clGetPlatformIDs(1, &platform_id,NULL);
   if (err != CL_SUCCESS)
       printf("Error: Failure in clGetPlatformIDs,error code=%d \n",err);
       return 0;
/* obtain information about platform */
   printf("CL_PLATFORM_NAME = %s\n", dname);
   printf("CL_PLATFORM_VERSION = %s\n", dname);
/* obtain list of devices available on platform */
   clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, 10, devices, &num_devices);
   printf("%d devices found\n", num_devices);
/* query devices for information */
   for (d = 0; d < num_devices; ++d) {
       clGetDeviceInfo(devices[d], CL_DEVICE_NAME, 500, dname,NULL);
       printf("Device #%d name = %s\n", d, dname);
       clGetDeviceInfo(devices[d],CL_DRIVER_VERSION, 500, dname,NULL);
       printf("\tDriver version = %s\n", dname);
       printf("\tGlobal Memory (MB):\t%llu\n",long_entries/1024/1024);
       printf("\tGlobal Memory Cache (MB):\t%llu\n",long_entries/1024/1024);
       printf("\tLocal Memory (KB):\t%llu\n",long_entries/1024);
       printf("\tMax clock (MHz) :\t%llu\n",long_entries);
       printf("\tMax Work Group Size:\t%d\n",p_size);
       printf("\tNumber of parallel compute cores:\t%d\n",entries);
   return 0;


Output of this program is shown below. Note the useful information, including the version of OpenCL installed, and the capabilities of hardware visible to OpenCL. On machines with NVIDIA cards using NVIDIA's OpenCL only the GPUs are visible. On ATI cards both the GPUs and CPUs will be visible.

The output of this program on machine is:

1 devices found
Device #0 name = GeForce GTX 480
       Driver version = 270.41.19
       Global Memory (MB):     1535
       Global Memory Cache (MB):       0
       Local Memory (KB):      48
       Max clock (MHz) :       1401
       Max Work Group Size:    1024
       Number of parallel compute cores:       15

The output of on machine is:

CL_PLATFORM_NAME = AMD Accelerated Parallel Processing
CL_PLATFORM_VERSION = OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
2 devices found
Device #0 name = Cypress
       Driver version = CAL 1.4.900
       Global Memory (MB):     2048
       Global Memory Cache (MB):       0
       Local Memory (KB):      32
       Max clock (MHz) :       0
       Max Work Group Size:    256
       Number of parallel compute cores:       20
Device #1 name = Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz
       Driver version = 2.0
       Global Memory (MB):     48444
       Global Memory Cache (MB):       0
       Local Memory (KB):      32
       Max clock (MHz) :       1596
       Max Work Group Size:    1024
       Number of parallel compute cores:       8

Using OpenCL in Python with PyOpenCL

Detailed instructions provided on separate PyOpenCL page.


o Website of Khronos Group Consortium Which Manages the OpenCL Standard" >

o NVIDIA OpenCL Website" >