OpenCL/OpenGL Interoperation

Get the source code for this section.

This tutorial is about manipulating OpenGL data using OpenCL without the need to transfer data back and forth. This is probably one of the most important OpenCL features when coupled with 3D games or 3D software because data transfers are currently the bottleneck of OpenCL (and CUDA and Stream).

Since we’ll use C#, in this tutorial OpenTK will be used for OpenGL commands and Cloo will be used for OpenCL commands.

This tutorial assumes that the reader knows the basics about OpenGL, including: camera setup, depth test, and Vertex Buffer Objects. If you are not familiar with these subjects I recommend reading OpenTK documentation.

CMSoft’s CLGL Interoperation with Textures Tutorial demonstrates how to share OpenGL Textures with OpenCL.

1. Overview

What we will do in this tutorial is show how to use OpenCL / OpenGL interoperation to manipulate OpenGL Vertex Buffer Objects using OpenCL. We create a NxN grid of vertexes and simulate circular wave propagation and interference. The video below should provide a better example of what we can do with the Interop:

In my computer I was able to run a 400×400 grid real-time without any problem. That means updating almost 10 Mb/s of data in GPU memory!

What exactly means OpenCL/OpenGL interoperation? It means manipulating OpenGL buffers (vertex and textures) directly inside the GPU memory without the need to copy data back and forth. In this tutorial we will manipulate OpenGL Vertex Buffer Objects using OpenCL.

We shall take these steps:


  • Create the OpenGL context and drawing region;
  • Create OpenGL vertex buffer objects to store vertex coordinates, normals, colors and elements;
  • Set the OpenGL draw function to display our object.


  • Create the OpenCL / OpenGL shared context;
  • Acquire and manipulate OpenGL vertex buffer objects;
  • Release OpenGL objects.

If you are familiar with OpenGL and have read through this OpenCL tutorial this should be very easy.

2. Setting up OpenGL

First of all, you need to create a new project and include the files Cloo.dll, OpenTK.dll and OpenTK.GLControl.dll. After that, right click toolbox, select OpenTK.GLControl.dll from .NET components and add the control to the Toolbox.

I really hope you can create an empty form with a GLControl inside it. Refer to the source code for this section for the complete code or check OpenTK documentation if you want to build it from scratch.

2.1 Initialization

The OpenGL initialization involves enabling depth buffer, creating a projection (which will be Ortho for simplicity). First of all, let’s create a function to setup the viewport and the view matrix:

private void SetupViewport()
    int w = glControl1.Width;
    int h = glControl1.Height;
    GL.Ortho(-2, 2, -2, 2, -15, 15); // Bottom-left corner pixel has coordinate (0, 0)
    GL.Viewport(0, 0, w, h); // Use all of the glControl painting area

Now we will set up OpenGL initialization inside glControl1_Load event:

private void glControl1_Load(object sender, EventArgs e)
    //Materials, color
    GL.ColorMaterial(MaterialFace.FrontAndBack, ColorMaterialParameter.AmbientAndDiffuse);
    GL.BlendFunc(BlendingFactorSrc.SrcAlpha, BlendingFactorDest.OneMinusSrcAlpha);

    // (...)

What the above code does is enable depth test, materials and blending, in case you want to play with the Alpha components of the colors later. Now let’s create some simple data to populate OpenGL vertex buffer objects. The VBO pattern we will use will allow us to use glDrawElements. Let’s start with an simpler example that draws two triangle on to the screen using four vertexes:

//Vertex buffer objs
PositionData = new float[]
    0.0f,-1.0f, 1.0f,
    1.0f,-1.0f, 1.0f,
    1.0f, 1.0f, 1.0f,
    -1.0f, 1.0f, 1.0f
ElementData = new int[]
NormalsData = new float[]
    0.0f, 0.0f, 1f,
    0.0f, 0.0f, 1f,
    0.0f, 0.0f, 1f,
    0.0f, 0.0f, 1f
ColorData = new float[]

The arrays above were created just to show an example of the structure. Now let’s create more interesting data. Recall that each vertex has 3 components (xyz), each normal has 3 components (xyz), each color has 4 components (rgba) and each triangle is defined by 3 vertexes.

We will create a NxN grid in xy in the interval [-2,2] x [-2,2], with all z components set to zero because we will manipulate the Z components using OpenCL:

int N = 400;
PositionData = new float[3 * N * N];
NormalsData = new float[3 * N * N];
ColorData = new float[4 * N * N];
for (int i = 0; i < N; i++)
    for (int j = 0; j < N; j++)
        PositionData[3 * (i + N * j)] = 4*((float)i / (float)N-0.5f);
        PositionData[1 + 3 * (i + N * j)] = 4*((float)j / (float)N-0.5f);
        NormalsData[2 + 3 * (i + N * j)] = 1.0f;
        ColorData[2 + 4 * (i + N * j)] = (float)i/(float)(N-1);
        ColorData[3 + 4 * (i + N * j)] = 1.0f;
ElementData = new int[3 * 2 * (N - 1) * (N - 1)];
for (int i = 0; i < N - 1; i++)
    for (int j = 0; j < N - 1; j++)
        ElementData[6 * (i + (N - 1) * j)] = i + N * j;
        ElementData[6 * (i + (N - 1) * j) + 1] = i + N * (j + 1);
        ElementData[6 * (i + (N - 1) * j) + 2] = i + 1 + N * (j + 1);
        ElementData[6 * (i + (N - 1) * j) + 3] = i + N * j;
        ElementData[6 * (i + (N - 1) * j) + 4] = i + 1 + N * (j + 1);
        ElementData[6 * (i + (N - 1) * j) + 5] = i + 1 + N * j;
Now we create the OpenGL vertex buffer objects:
bufs = new int[4];
GL.GenBuffers(4, bufs);
GL.BindBuffer(BufferTarget.ArrayBuffer, bufs[0]);
GL.BufferData(BufferTarget.ArrayBuffer, (IntPtr)(ColorData.Length * sizeof(float)), ColorData, BufferUsageHint.StreamDraw);
GL.BindBuffer(BufferTarget.ArrayBuffer, bufs[1]);
GL.BufferData(BufferTarget.ArrayBuffer, (IntPtr)(PositionData.Length * sizeof(float)), PositionData, BufferUsageHint.StreamDraw); GL.BindBuffer(BufferTarget.ArrayBuffer, bufs[2]);
GL.BufferData(BufferTarget.ArrayBuffer, (IntPtr)(NormalsData.Length * sizeof(float)), NormalsData, BufferUsageHint.StaticDraw);
GL.BindBuffer(BufferTarget.ElementArrayBuffer, bufs[3]);
GL.BufferData(BufferTarget.ElementArrayBuffer, (IntPtr)(ElementData.Length *sizeof(int)), ElementData, BufferUsageHint.StaticDraw);

We won’t use normals data in this example but I decided to include in case you want to manipulate the normal vectors later.

We will still add some OpenCL initialization code in this Load event.

2.2 Drawing the OpenGL scene

Our scene will have one single object whose vertexes’ z component we will manipulate. Basically, what we need to do is call the glDrawElements function:

private void glControl1_Paint(object sender, PaintEventArgs e)
    if (!initialized) return;
void Draw()
    double tempo = sw.Elapsed.TotalSeconds;
    GL.Clear(ClearBufferMask.ColorBufferBit | ClearBufferMask.DepthBufferBit);
    GL.Rotate(-40, 0.8f, 0, 0);
    GL.Rotate(-tempo, 0f, 0f, 0.8f);
    GL.BindBuffer(BufferTarget.ArrayBuffer, bufs[0]);
    GL.ColorPointer(4, ColorPointerType.Float, 0, 0);
    GL.BindBuffer(BufferTarget.ArrayBuffer, bufs[1]);
    GL.VertexPointer(3, VertexPointerType.Float, 0, 0);
    GL.BindBuffer(BufferTarget.ArrayBuffer, bufs[2]);
    GL.NormalPointer(NormalPointerType.Float, 0, 0);
    GL.BindBuffer(BufferTarget.ElementArrayBuffer, bufs[3]);
    GL.EnableClientState(ArrayCap.NormalArray); int teste = (int)(tempo*0.05);
    if (teste % 3 == 0)
        GL.DrawElements(BeginMode.Triangles, ElementData.Length, DrawElementsType.UnsignedInt, 0);
    else if (teste % 3 ==1)
        GL.DrawElements(BeginMode.Lines, ElementData.Length, DrawElementsType.UnsignedInt, 0);
        GL.DrawElements(BeginMode.Points, ElementData.Length, DrawElementsType.UnsignedInt, 0);

The call to CQ.Finish() is related to OpenCL to make sure that the Command Queue is not manipulating the data before OpenGL starts drawing. Also, I have added a StopWatch that starts ticking when the program is loaded to pass time information to OpenCL and a teste variable to switch the drawing style over time.

This code alone would draw a very uninteresting flat plane on to the screen. Now that the initial OpenGL setup is ready, let’s move to the main topic of the tutorial, which is using OpenCL to manipulate the data.

3. OpenCL initialization

3.1 Creating OpenCL context from OpenGL

We need to create a OpenCL context from the OpenGL context, as per the OpenCL Spec. To do this, we need to inform the OpenCL API what OpenCL context we are using and the current DC. Fortunately, Cloo’s manager Nythrix and I have figured a way to accomplish this and create the OpenCL context from OpenGL. We will first need to use a P/Invoke to getCurrentDC (don’t forget to include using System.Runtime.InteropServices;):

extern static IntPtr wglGetCurrentDC();

Now we can create the OpenCL context Ctx:

IntPtr curDC = wglGetCurrentDC();
OpenTK.Graphics.IGraphicsContextInternal ctx = (OpenTK.Graphics.IGraphicsContextInternal)OpenTK.Graphics.GraphicsContext.CurrentContext;
IntPtr raw_context_handle = ctx.Context.Handle;
ComputeContextProperty p1 = newComputeContextProperty(ComputeContextPropertyName.CL_GL_CONTEXT_KHR, raw_context_handle);
ComputeContextProperty p2 = newComputeContextProperty(ComputeContextPropertyName.CL_WGL_HDC_KHR, curDC);
ComputeContextProperty p3 = newComputeContextProperty(ComputeContextPropertyName.Platform,ComputePlatform.Platforms[0].Handle); List<ComputeContextProperty> props = newList<ComputeContextProperty>() { p1, p2, p3 };
ComputeContextPropertyList Properties = new ComputeContextPropertyList(props);
Ctx = new ComputeContext(ComputeDeviceTypes.Gpu, Properties, null, IntPtr.Zero);

The Ctx context created will be capable of manipulating OpenCL vertex buffer arrays directly.

3.2 Creating OpenCL arrays from OpenGL VBOs

Cloo provides us an easy way to create buffer objects (OpenCL variables) from OpenGL VBOs:

CLGLPositions = ComputeBuffer<float>.CreateFromGLBuffer<float> (Ctx,ComputeMemoryFlags.ReadWrite, bufs[1]);
CLGLColors = ComputeBuffer<float>.CreateFromGLBuffer<float> (Ctx,ComputeMemoryFlags.ReadWrite, bufs[0]);
varTempo = new ComputeBuffer<float>(Ctx, ComputeMemoryFlags.CopyHostPointer |ComputeMemoryFlags.ReadWrite, Tempo);

It’s simple as that. Recall that we stored vertexes positions in OpenGL buffer bufs[1] and colors in bufs[0]. The ComputeBuffer varTime is a regular buffer to store the simulation time, which we will pass on to the kernel.

3.3 OpenCL kernel

The last initialization task is to create the kernel we will use to manipulate the data, as well as the Command Queue:

string interopTeste = @"
__kernel void

interopTeste(__global float * positions,__global float * cores, __global float * tempo)
    int i = get_global_id(0);
    float x = positions[3*i]+0.7;
    float y = positions[3*i+1];
    float r = native_sqrt(x*x+y*y);
    float t = tempo[0];
    float valor = native_exp(- r * 2.5f)*native_sin(40*r-4*t);
    x -= 1.4;
    r = native_sqrt(x*x+y*y);
    valor += native_exp(- r * 1.5f)*native_sin(40*r-4*t);
    positions[3*i+2] = valor;
    cores[4*i] = clamp(valor,0,1);
//Create a new OpenCL program
ComputeProgram prog = null;
prog = new ComputeProgram(Ctx, interopTeste);
prog.Build(Ctx.Devices, "", null, IntPtr.Zero);
//Create the kernel
kernelinteropTeste = prog.CreateKernel("interopTeste");
kernelinteropTeste.SetMemoryArgument(0, CLGLPositions);
kernelinteropTeste.SetMemoryArgument(2, varTempo);
kernelinteropTeste.SetMemoryArgument(1, CLGLColors);
CQ = new ComputeCommandQueue(Ctx, Ctx.Devices[0], ComputeCommandQueueFlags.None);
initialized = true;

As you can see, the last part of initialization has nothing new. Notice that the kernel manipulates the Z component of the vertexes (index 3*i+2) and the R component of the colors (index 4*i). The function chosen simulates the interference of two waves. Also notice that the first two arguments of the interopTeste kernel are OpenGL vertex buffer objects.

4. Acquiring and manipulating vertex buffer objects

I have decided to use a Timer component for simplicity. The Timer will be used to acquire, manipulate and release the OpenGL objects.

4.1  Locking and releasing the data

OpenCL and OpenGL cannot manipulate data at the same time. Because of this, it is necessary to flush OpenGL before starting OpenCL operations and vice-versa. So, before drawing, we need to call CQ.Finish(); (to finish the Command Queue) and before computing we need to call GL.Finish();.

We also need to acquire and release the buffer objects so as to inform the GPU what kind of operation is being executed (OpenGL draw operations or OpenCL compute operations). This is fairly easy too:

List<ComputeMemory> c = new List<ComputeMemory>() { CLGLPositions, CLGLColors };
CQ.AcquireGLObjects(c, null);
// (use OpenCL)
CQ.ReleaseGLObjects(c, null);

4.2 Data manipulation

Below you can find the complete OpenCL code and declared variables.

Executing the kernel is very easy. We just have to write the current elapsed time to Device memory and use the Command Queue created in the shared OpenCL OpenGL context to execute it:

ComputeBuffer<float> CLGLPositions;
ComputeBuffer<float> CLGLColors;
ComputeContext Ctx;
ComputeBuffer<float> varTempo;
float[] Tempo = new float[1];
ComputeKernel kernelinteropTeste;
ComputeCommandQueue CQ;

private void timer1_Tick(object sender, EventArgs e)
    if (!initialized) return;
    List<ComputeMemory> c = new List<ComputeMemory>() { CLGLPositions, CLGLColors };
    CQ.AcquireGLObjects(c, null);
    //Read elapsed time from Stopwatch and write to Device memory
    Tempo[0] = (float)sw.Elapsed.TotalSeconds;
    CQ.Write<float>(varTempo, Tempo, null);
    CQ.Execute(kernelinteropTeste, null, new long[1] { PositionData.Length / 3 }, null, null);
    CQ.ReleaseGLObjects(c, null);
    //redraw OpenGL scene

5. Conclusion

We have presented a working example that displays a NxN grid showing circular wave propagation and interference using OpenCL / OpenGL interoperation. An OpenCL context is created from OpenGL, thus allowing access to OpenGL vertex buffer objects for manipulating with OpenCL.

I believe that this is soon going to become the standard way of manipulating VBO data in games because both the computation is much faster and there is no need to transfer big amounts of data between Host memory and Device memory, which is currently the bottleneck of GPGPU (OpenCL/CUDA/Stream).

Get the source code for this section.

Leave a Reply

Your email address will not be published. Required fields are marked *