From Documentation
Revision as of 14:18, 14 May 2019 by Ppomorsk (Talk | contribs)

Jump to: navigation, search

OpenCL is the first open standard for writing programs that can execute across heterogeneous platforms, most importantly both CPUs and GPUs. OpenCL includes a language (based on C) for writing kernels (functions which can be executed on OpenCL devices), plus APIs to access and control the devices.

The best place to obtain authoritative information about OpenCL is the website of the Khronos Group consortium which maintains the OpenCL standard.

Compiling OpenCL programs

Compile as you would any other C code, but link to the OpenCL library. On some clusters you also need to specify path to the include files.

To compile on the graham cluster with the Intel compiler, run:

module load cuda
icc -o test.x code.c -lOpenCL

To compile on the graham cluster with GCC compiler, run:

module load cuda
gcc -o test.x code.c -lOpenCL

Compiling OpenCL examples from CUDA SDK on monk

Starting with CUDA 5.0, NVIDIA no longer ships OpenCL samples bundled with CUDA. They can still be downloaded from NVIDIA developer OpenCL page.

To run the OpenCL software development kit examples that come with CUDA 4.1, you need to first switch to that CUDA version with

module switch cuda/4.1

then you should copy its files to some location in your user space (here it will be /work/$USER/my_sdk_work but you can choose something else), following the steps below. Here $USER is an environment variable that is set to your username. You can replace $USER with your actual username in the commands below if you prefer.

cd /work/$USER/
mkdir my_sdk_work
cd my_sdk_work
cp -rp /opt/sharcnet/cuda/4.1/sdk/* .
cd OpenCL

This will create the binary executable files in


You can then experiment with changes in the source code, located in:


The examples you compiled will not run on the monk login node which does not have a GPU. They will run if you submit them as jobs to the gpu queue. You can also log into mon54 node of monk which has been set aside as a development node. To do that, do "ssh mon54" once you have logged into monk. It has two GPUs accessible to users in interactive mode, i.e. you don't have to use sqsub to run, and can just run executables that use the GPU from the command line.

Determining which OpenCL devices are available

Since OpenCL is designed to run on many platforms, it is particularly important for an OpenCL program to determine the characteristics of the hardware it is running on. OpenCL standard provides a rich set of routines which can provide detailed information about the capabilities of the system and the OpenCL devices available.

Below is an example program which lists some system information and devices available. Feel free to use it to determine the capabilities of the system you are on.

To provide useful information it should be run on a compute node which has OpenCL devices (GPUs) connected.

#include <stdio.h>
#include <CL/cl.h>
int main(int argc, char** argv) {
   char dname[500];
   cl_device_id devices[10];
   cl_uint num_devices,entries;
   cl_ulong long_entries;
   int d;
   cl_int err;
   cl_platform_id platform_id = NULL;
   size_t p_size;
/* obtain list of platforms available */
   err = clGetPlatformIDs(1, &platform_id,NULL);
   if (err != CL_SUCCESS)
       printf("Error: Failure in clGetPlatformIDs,error code=%d \n",err);
       return 0;
/* obtain information about platform */
   printf("CL_PLATFORM_NAME = %s\n", dname);
   printf("CL_PLATFORM_VERSION = %s\n", dname);
/* obtain list of devices available on platform */
   clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, 10, devices, &num_devices);
   printf("%d devices found\n", num_devices);
/* query devices for information */
   for (d = 0; d < num_devices; ++d) {
       clGetDeviceInfo(devices[d], CL_DEVICE_NAME, 500, dname,NULL);
       printf("Device #%d name = %s\n", d, dname);
       clGetDeviceInfo(devices[d],CL_DRIVER_VERSION, 500, dname,NULL);
       printf("\tDriver version = %s\n", dname);
       printf("\tGlobal Memory (MB):\t%llu\n",long_entries/1024/1024);
       printf("\tGlobal Memory Cache (MB):\t%llu\n",long_entries/1024/1024);
       printf("\tLocal Memory (KB):\t%llu\n",long_entries/1024);
       printf("\tMax clock (MHz) :\t%llu\n",long_entries);
       printf("\tMax Work Group Size:\t%d\n",p_size);
       printf("\tNumber of parallel compute cores:\t%d\n",entries);
   return 0;


Output of this program is shown below. Note the useful information, including the version of OpenCL installed, and the capabilities of hardware visible to OpenCL. On machines with NVIDIA cards using NVIDIA's OpenCL only the GPUs are visible. On ATI cards both the GPUs and CPUs will be visible.

The output of this program on machine is:

1 devices found
Device #0 name = GeForce GTX 480
       Driver version = 270.41.19
       Global Memory (MB):     1535
       Global Memory Cache (MB):       0
       Local Memory (KB):      48
       Max clock (MHz) :       1401
       Max Work Group Size:    1024
       Number of parallel compute cores:       15

The output of on machine is:

CL_PLATFORM_NAME = AMD Accelerated Parallel Processing
CL_PLATFORM_VERSION = OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
2 devices found
Device #0 name = Cypress
       Driver version = CAL 1.4.900
       Global Memory (MB):     2048
       Global Memory Cache (MB):       0
       Local Memory (KB):      32
       Max clock (MHz) :       0
       Max Work Group Size:    256
       Number of parallel compute cores:       20
Device #1 name = Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz
       Driver version = 2.0
       Global Memory (MB):     48444
       Global Memory Cache (MB):       0
       Local Memory (KB):      32
       Max clock (MHz) :       1596
       Max Work Group Size:    1024
       Number of parallel compute cores:       8

Using OpenCL in Python with PyOpenCL

Detailed instructions provided on separate PyOpenCL page.


o Website of Khronos Group Consortium Which Manages the OpenCL Standard" >

o NVIDIA OpenCL Website" >