From Documentation
Revision as of 16:02, 13 October 2010 by Merz (Talk | contribs) (Point out AMD webinar schedule)

Jump to: navigation, search

OpenCL is the first open standard for writing programs that can execute across heterogeneous platforms, most importantly both CPUs and GPUs. OpenCL includes a language (based on C) for writing kernels (functions which can be executed on OpenCL devices), plus APIs to access and control the devices.

The best place to obtain authoritative information about OpenCL is the website of the Khronos Group consortium which maintains the OpenCL standard. AMD has also started a regular webinar series which may be of interest.

Using OpenCL

OpenCL is available on SHARCNET on the cluster angel. Please consult the OpenCL page on our main portal for the latest version information. It is also installed on some of the vizualization workstations (for example

Mac OS X 10.6 Snow Leopard has built in support for OpenCL (for both GPUs and CPUs), so it is a good development platform for OpenCL programs (consult OpenCL Programming Guide for Mac OS X).

Determining which OpenCL devices are available

Since OpenCL is designed to run on many platforms, it is particularly important for an OpenCL program to determine the characteristics of the hardware it is running on. OpenCL standard provides a rich set of routines which can provide detailed information about the capabilities of the system and the OpenCL devices available.

Here is an example program which lists some system information. It can be compiled on a system where OpenCL libraries are available with:

gcc -o test.x get_opencl_information.c -lOpenCL

To provide useful information it should be run on a compute node which has OpenCL devices (GPUs) connected.

#include <stdio.h>
#include <CL/cl.h>
int main(int argc, char** argv) {
   char dname[500];
   cl_device_id devices[10];
   cl_uint num_devices,entries;
   cl_ulong long_entries;
   int d;
   cl_int err;
   cl_platform_id platform_id = NULL;
   size_t p_size;
/* obtain list of platforms available */
   err = clGetPlatformIDs(1, &platform_id,NULL);
   if (err != CL_SUCCESS)
       printf("Error: Failure in clGetPlatformIDs,error code=%d \n",err);
       return 0;
/* obtain information about platform */
   printf("CL_PLATFORM_NAME = %s\n", dname);
   printf("CL_PLATFORM_VERSION = %s\n", dname);
/* obtain list of devices available on platform */
   clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, 10, devices, &num_devices);
   printf("%d devices found\n", num_devices);
/* query devices for information */
   for (d = 0; d < num_devices; ++d) {
       clGetDeviceInfo(devices[d], CL_DEVICE_NAME, 500, dname,NULL);
       printf("Device #%d name = %s\n", d, dname);
       clGetDeviceInfo(devices[d],CL_DRIVER_VERSION, 500, dname,NULL);
       printf("\tDriver version = %s\n", dname);
       printf("\tGlobal Memory (MB):\t%llu\n",long_entries/1024/1024);
       printf("\tGlobal Memory Cache (MB):\t%llu\n",long_entries/1024/1024);
       printf("\tLocal Memory (KB):\t%llu\n",long_entries/1024);
       printf("\tMax clock (MHz) :\t%llu\n",long_entries);
       printf("\tMax Work Group Size:\t%d\n",p_size);
       printf("\tNumber of parallel compute cores:\t%d\n",entries);
   return 0;

Example output of this program on machine is:

1 devices found
Device #0 name = Quadro FX 4800
       Driver version = 195.36.15
       Global Memory (MB):     1535
       Global Memory Cache (MB):       0
       Local Memory (KB):      16
       Max clock (MHz) :       1204
       Max Work Group Size:    512
       Number of parallel compute cores:       24