Intermediate aspects of the C99 OpenCL language

Download source code example for this section.

The OpenCL C99 has some important differences when compared to regular C99. Once again, I would like to emphasize that I assume the reader to know the C# and C programming languages. I am not going to discuss pointers or structures in this tutorial.

The aspects I consider to be intermediate about the OpenCL C99 are some built-in functions that will perform really fast if you use them. It is very important to use the built-in functions for performance gain.

I strongly suggest the user to read ALL math, geometric and common functions in the C99 specification. This may look boring at first but it is really important to use the math functions efficiently. Look at the screenshot of the source code example for this section:

As you can see, using native functions instead of full precision functions can speed up calculations by a factor of 35.

1. Enabling extensions

The OpenCL C99 optional extensions need to be enabled by means of a #pragma directive. This happens because the optional extensions, as the name says, may not be supported by a given device.

You may want to check the OpenCL specification to know which optional extensions you will need to use. At the moment, both AMD and NVidia support all the optional extensions, which is definitely a thumbs up.

This is the way to enable a given extension:

#pragma OPENCL EXTENSION extension name : behavior

Behavior can be enable or disable. Extension name is the optional extension that should be enabled. Some important extensions:

#pragma OPENCL EXTENSION cl_khr_fp64 : enable

Enables double precision arithmetics and ALL math, geometric and common functions.

#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable

Enables the application to write 1 byte data types into global memory. This is very important to effectively manipulate strings or monochromatic pictures.

Straight from the OpenCL Documentation:

In the example given in section 6.8.m and copied below the lines in bold will compile and work correctly if the cl_khr_byte_addressable_store extension is supported.

__kernel void
do_proc (__global char *pA, short b, __global short *pB)
{
  char x[100];
  __private char *px = x;
  int id = (int)get_global_id(0);
  short f;
  f = pB[id] + b;
  px[1] = pA[1]; // no longer an error.
  pB[id] = b; // no longer an error.
}

The importance of double is quite obvious for scientific computing. Byte addressables are very important too because you don’t want to transfer 4-byte integers when you can transfer 1-byte chars. Atomics and 3D images are also optional extensions. Once again, I recommend reading the OpenCL specifications if you want to know about all OpenCL extensions.

2. Using the geometric functions

The geometric functions are really useful when creating algorithms that have to do with 3D. You won’t need to worry about creating functions to compute distance, cross product and dot product anymore.

They operate on gentype4 vectors, where gentype is one of the supported types (int, float, double, etc). In practice, the functions will be used to evaluate float4 and double4 vectors. So, for example, you may want to calculate normal vectors by using

float4 dir1 = (float4)(1, 1, 1, 0);
float4 dir2 = (float4)(1, 2, 3, 0);
float4 normal = cross(dir1, dir2);

3. Optimizing with native functions

The native functions are faster, not so precise hardware implemented versions of regular functions.

Notice that the “not so precise” term doesn’t mean that the function is just a rough approximation. The native functions will usually be accurate enough for any math that will be displayed on the screen. That said, if you are developing a game, for example, consider using native functions every time you need.

The native functions will more often than not have the needed precision and they should be used as a rule. Consider using the native functions as much as possible and switching to the non natives only when needed, for example, after converging to a solution using the native_.

3.1. native_ functions

There are many native_ functions, both math and geometric functions.

You may consider using the following native_ functions: native_sin, native_cos, native_powr, fast_length. There are many others, check the documentation.

The only secret about the native functions is knowing they exist, honestly. Well, now you know.

3.2. Function mad

The multiply-add function mad is so important that it has a built option specific for enabling mad optimization. I personally like to have a greater degree of control and call the function explicitly because I might sometimes want precision compliant with the standards. This is how mad works:

mad(a,b,c) = a*b + c.

So, you may want to optimize

a*b+c*d+e*f

by using

mad(a,b,mad(c,d,e*f))

As you can see, it was not possible to avoid one full-precision product but the rest of the calculation can be optimized.

4. Optimizing with the common functions

Image processing and filtering uses some interpolation functions. In fact, some interpolation functions are so important that they have hardware versions. Take a look at the documentation and look for the smoothstep and mix functions.

The mix function is particularly interesting for interpolation:

mix(a,b,c) = a + (b-a)*c, with c varying from 0 to 1.

This is how the documentation shows the function. Now check this:

mix(min, max, t) = min + (max - min)*t, t from 0 to 1.

This is much more intuitive. Function mix is the linear interpolation function.

5. Example

This section brings more information than new functions, which actually makes it difficult to create source code example.

Have in mind that some OpenCL devices can be slower when using vector functions. Personally, I think that it is still important to have vector versions of the functions because eventually the drivers will optimize vector implementations to run OK on them. The contrary will never be possible since it may be very hard for an algorithm to decide on its own what can be done as vector operation.

Now on to the code. The example below will simply execute a series of operations using regular functions and their native conterparts. This is just a performance comparison.

#pragma OPENCL EXTENSION cl_khr_fp64 : enable
kernel void regularFuncs()
{
   for (int i=0; i<5000; i++)
   {
       float a=1, b=2, c=3, d=4;
       float e = a*b+c;
       e = a*b+c*d;
       e = sin(a);
       e = cos(b);
       e = a*b+c*d;
       e = sin(a);
       e = cos(b);
       e = a*b+c*d;
       e = sin(a);
       e = cos(b);
       float4 vec1 =  (float4)(1, 2, 3, 0);
       float4 vec2 =  (float4)(-1, 3, 1, 0);
       float4 vec = distance(vec1, vec2);
       double x=1, y=2, z=3;
       double resp = x*y+z;
   }
}
kernel void nativeFuncs()
{
   for (int i=0; i<5000; i++)
   {
       float a=1, b=2, c=3, d=4;
       float e = mad(a,b,c);
       e = mad(a,b,c*d);
       e = native_sin(a);
       e = native_cos(b);
       e = mad(a,b,c*d);
       e = native_sin(a);
       e = native_cos(b);
       e = mad(a,b,c*d);
       e = native_sin(a);
       e = native_cos(b);
       float4 vec1 =  (float4)(1, 2, 3, 0);
       float4 vec2 =  (float4)(-1, 3, 1, 0);
       float4 vec = fast_distance(vec1, vec2);
       double x=1, y=2, z=3;
       double resp = mad(x,y,z);
   }
}

As mentioned before, I have achieved a 35x faster code using native functions. So, my advice would be: use native functions. Only use regular functions if you absolutely need and just for refining the solutions.

Download source code example for this section.

2 thoughts on “Intermediate aspects of the C99 OpenCL language”

Leave a Reply

Your email address will not be published. Required fields are marked *