From Documentation
Revision as of 14:42, 14 May 2019 by Ppomorsk (Talk | contribs) (Compiling OpenCL examples from CUDA SDK on monk)

Jump to: navigation, search

OpenCL is the first open standard for writing programs that can execute across heterogeneous platforms, most importantly both CPUs and GPUs. OpenCL includes a language (based on C) for writing kernels (functions which can be executed on OpenCL devices), plus APIs to access and control the devices.

The best place to obtain authoritative information about OpenCL is the website of the Khronos Group consortium which maintains the OpenCL standard.

Compiling OpenCL programs

Compile as you would any other C code, but link to the OpenCL library. On some clusters you also need to specify path to the include files.

To compile on the graham cluster with the Intel compiler, run:

module load cuda
icc -o test.x code.c -lOpenCL

To compile on the graham cluster with GCC compiler, run:

module load cuda
gcc -o test.x code.c -lOpenCL

OpenCL examples

OpenCL example provided by NVIDIA can be downloaded from NVIDIA developer OpenCL page.

Determining which OpenCL devices are available

Since OpenCL is designed to run on many platforms, it is particularly important for an OpenCL program to determine the characteristics of the hardware it is running on. OpenCL standard provides a rich set of routines which can provide detailed information about the capabilities of the system and the OpenCL devices available.

Below is an example program which lists some system information and devices available. Feel free to use it to determine the capabilities of the system you are on.

To provide useful information it should be run on a compute node which has OpenCL devices (GPUs) connected.

#include <stdio.h>
#include <CL/cl.h>
int main(int argc, char** argv) {
   int MAX_PLATFORMS=10;
   int MAX_DEVICES=10;
   char dname[500];
   cl_device_id devices[MAX_DEVICES];
   cl_uint num_devices,entries;
   cl_ulong long_entries;
   int d,ip;
   cl_int err;
   cl_uint num_platforms;
   cl_platform_id platform_id[MAX_PLATFORMS];
   size_t p_size;
/* obtain list of platforms available */
   err = clGetPlatformIDs(2, platform_id,&num_platforms);
   if (err != CL_SUCCESS)
       printf("Error: Failure in clGetPlatformIDs,error code=%d \n",err);
       return 0;
   printf("Found %d platforms \n", num_platforms);
    for (ip=0;ip<num_platforms;ip++){
/* obtain information about platform */
        printf("CL_PLATFORM_NAME = %s\n", dname);
        printf("CL_PLATFORM_VERSION = %s\n", dname);
/* obtain list of devices available on platform */
        clGetDeviceIDs(platform_id[ip], CL_DEVICE_TYPE_ALL, 10, devices, &num_devices);
        printf("%d devices found\n", num_devices);
/* query devices for information */
        for (d = 0; d < num_devices; ++d) {
            clGetDeviceInfo(devices[d], CL_DEVICE_NAME, 500, dname,NULL);
            printf("Device #%d name = %s\n", d, dname);
            clGetDeviceInfo(devices[d],CL_DRIVER_VERSION, 500, dname,NULL);
            printf("\tDriver version = %s\n", dname);
            printf("\tGlobal Memory (MB):\t%llu\n",long_entries/1024/1024);
            printf("\tGlobal Memory Cache (MB):\t%llu\n",long_entries/1024/1024);
            printf("\tLocal Memory (KB):\t%llu\n",long_entries/1024);
            printf("\tMax clock (MHz) :\t%llu\n",long_entries);
            printf("\tMax Work Group Size:\t%d\n",p_size);
            printf("\tNumber of parallel compute cores or multiprocessors:\t%d\n",entries);
    return 0;


Output of this program running on a graham GPU node is shown below. Note the useful information, including the version of OpenCL installed, and the capabilities of hardware visible to OpenCL. Note that two versions of OpenCL are present: one from Intel that uses the CPU and detects it as a device, and one from NVIDIA that uses a GPU (and detects 2 GPUs as devices in this case). A program needs to make a choice which of the available devices to use.

Found 2 platforms
1 devices found
Device #0 name = Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
	Driver version =
	Global Memory (MB):	128540
	Global Memory Cache (MB):	0
	Local Memory (KB):	32
	Max clock (MHz) :	2100
	Max Work Group Size:	8192
	Number of parallel compute cores or multiprocessors:	32
2 devices found
Device #0 name = Tesla P100-PCIE-12GB
	Driver version = 410.48
	Global Memory (MB):	12198
	Global Memory Cache (MB):	0
	Local Memory (KB):	48
	Max clock (MHz) :	1328
	Max Work Group Size:	1024
	Number of parallel compute cores or multiprocessors:	56
Device #1 name = Tesla P100-PCIE-12GB
	Driver version = 410.48
	Global Memory (MB):	12198
	Global Memory Cache (MB):	0
	Local Memory (KB):	48
	Max clock (MHz) :	1328
	Max Work Group Size:	1024
	Number of parallel compute cores or multiprocessors:	56


o Website of Khronos Group Consortium Which Manages the OpenCL Standard

o NVIDIA OpenCL Website