First OpenCL program

First OpenCL program

Get  the source code for this example.

We are ready to create our first OpenCL program now. This will be a very simple program because we haven’t covered much of the OpenCL C99 or the OpenCL API.

I recommend you go to the Khronos Group website and download the latest OpenCL specification. Even if you are not going to read it entirely, it is a great source of information. Just read the functions as you need them or as I point to them in this tutorial.

1. Setting up

1 – The first thing we have to do is create a new Visual C# Windows application:


2 – Next, we have to include a reference to Cloo and OpenCLTemplate. Use the menu Project -> Add reference to open the references:

3 – Find the OpenCL and OpenCLTemplate files (download them if you haven’t already) and add the references to your project. OpenCLTemplate has some dependencies on Cloo so we should include both;

4 – Throw in a button to your form, name it btnOpenCL and set its text property to OpenCL (this should be easy if you have any experience with visual languages). It should look like this:


Now we have the tools we need to start coding with OpenCL.

2. Creating the OpenCL source

The OpenCL source code is a string that follows the OpenCL C99 rules. Once again, refer to the Khronos Group website to download the OpenCL specification. You will probably want to download the C99 specification as well.

We will create a simple program that sums two vectors and stores the result in the first one. Let’s edit the click event of the button to execute our OpenCL code (which, in Visual C#, is done by double-clicking the button):

        private void btnOpenCL_Click(object sender, EventArgs e)
string vecSum = @”
                     __kernel void
floatVectorSum(__global       float * v1,
__global       float * v2)
// Vector element index
int i = get_global_id(0);
v1[i] = v1[i] + v2[i];


Let’s cover some basic concepts of the OpenCL C99 here. There’s no need to worry, we’ll come back to the C99 topic later.

  1. Notice that the OpenCL source is a string which will be compiled by the OpenCL compiler;
  2. The “__kernel” identifier tells OpenCL that we will call this function through the API;
  3. The “__global” identifier tells OpenCL that we will need access to v1 and v2 in the device memory;
  4. The “get_global_id(0)” is an OpenCL built-in function that returns which worker this is. In our code, the task of the i-th worker is to sum the i-th components of v1 and v2. The (0) is the dimension id we want, which we will discuss later.

As you can see, this code quite simple. What makes it simple is the fact that each sum is completely independent of the others.

3. Creating the host code

Now that we have the OpenCL source, we need to procceed into creating the host code that will initialize OpenCL, compile the code, transfer the necessary data to the device, execute the kernel and read the data back.

3.1 Initializing OpenCL and compiling the code

OpenCL initialization should identify all available devices and create the Command Queues, which are used to tell OpenCL which device should execute which command. I have encapsulated all this initialization into the InitCL function. Compiling the code shouldn’t be a problem. After compilation, it is necessary to use the OpenCL API to get host access to the OpenCL kernel so we can order its execution later. We’re going to keep adding code to the click event:

            //Initializes OpenCL Platforms and Devices and sets everything up

            //Compiles the source codes. The source is a string array because the user may want
//to split the source into many strings.

OpenCLTemplate.CLCalc.Program.Compile(new string[] { vecSum });

            //Gets host access to the OpenCL floatVectorSum kernel
OpenCLTemplate.CLCalc.Program.Kernel VectorSum = new OpenCLTemplate.CLCalc.Program.Kernel(“floatVectorSum”);

OpenCLTemplate sets the first GPU as the Default device to execute OpenCL commands. If you want to select a different device, set the OpenCLTemplate.CLCalc.Program.DefaultCQ to the number of the device that you want to use. These devices are listed in the OpenCLTemplate.exe program and the devices information is stored in the List OpenCLTemplate.CLCalc.CLDevices.

TIP: If you don’t want to write OpenCLTemplate every time, include the “using OpenCLTemplate” with the other “usings”.

3.2 Creating the vectors

We need to create the vectors we want to sum. This is done in host code. Let’s keep on adding code to the click event:

            //We want to sum 2000 numbers
int n = 2000;

            //Create vectors with 2000 numbers
float[] v1 = new float[n], v2=new float[n];

            //Creates population for v1 and v2
for (int i = 0; i < n; i++)
v1[i] = (float)i / 10;
v2[i] = -(float)i / 9;

3.3 Writing to the device memory

We need to send the data to the device memory. So, we create variables that will be v1 and v2 variables in the DEVICE. This function also copies the contents of the variable to the device.

            //Creates vectors v1 and v2 in the device memory
OpenCLTemplate.CLCalc.Program.Variable varV1 = new OpenCLTemplate.CLCalc.Program.Variable(v1);
OpenCLTemplate.CLCalc.Program.Variable varV2 = new OpenCLTemplate.CLCalc.Program.Variable(v2);

Remember: variables v1 and v2 are the vectors in HOST memory. Variables varV1 and varV2 are vectors in the DEVICE memory. These are usually different memories (unless you are targetting your CPU to execute the OpenCL code, which is better in some cases).

(This is encapsulation for the OpenCL API clEnqueueWriteBuffer function)

3.4 Executing the kernel

You would call a regular C# function by invoking Function(param1, param2, etc). To invoke an OpenCL function you need to specify the parameters AND how many workers you want. Remember that you will always have to know: what does each worker do? How many workers do I need?

In our case, each worker sums the i-th component of varV1 and varV2 (remember, OpenCL executes on the DEVICE, which means it reads device memory). This means we need as many workers as the dimension of varV1 and varV2. If each worker would sum two components of varV1 and varV2, only half the amount of workers would be needed. This will be discussed in Capabilities and Limitations.

The number of workers can be specified in multiple dimensions, i.e., the worker could be the i-th worker or the i-th, j-th worker for two dimensions. In our case, we only need one dimension.

We still need to tell the VectorSum kernel that its arguments are varV1 and varV2, and that it will need one worker for each element of varV1:

            //Arguments of VectorSum kernel
OpenCLTemplate.CLCalc.Program.Variable[] args = new OpenCLTemplate.CLCalc.Program.Variable[] { varV1, varV2 };

            //How many workers will there be? We need “n”, one for each element
int[] workers = new int[1] { n };

            //Execute the kernel
VectorSum.Execute(args, workers);

Notice that what the global float * v1 and global float * v2 in the kernel can actually see are the varV1 and varV2 variables.

(This is encapsulation for the OpenCL API clEnqueueNDRangeKernel function)

3.5 Retrieving data

The last step is to retrieve the information we need from the device memory.

            //Read device memory varV1 to host memory v1

(This is encapsulation for the OpenCL API clEnqueueReadBuffer function).

4. Conclusion

Try playing with the code you just made. Create some breakpoints to check if the sum was done correctly. Try changing the kernel so that it calculates the product of the elements of the vectors. Have some fun with the code.

OpenCL coding doesn’t get much more complicated than this unless you want to create games or very heavy real-time applications. If you are like me and need to do big calculations at once and display results in the end, this is just fine. If you want to make an  OpenGL game or real-time image processing application, I suggest you consider switching to C++ and doing your own fine-tuning for OpenCL.

There are still many points which have not been covered but this code shows the basics about running OpenCL. If you want to ask something use the Forum so the answer can help others as well.

Get  the source code for this example.


Leave a Reply

Your email address will not be published. Required fields are marked *