Reading and writing variables

As we saw before, the memory of the Device that is executing OpenCL is not directly accessable by the Host. Thus, it is necessary to provide a way to transfer data back and forth. This involves three steps:

  1. Create the variable space in Device memory;
  2. Write the variable to Device memory;
  3. Read the contents back to Host memory.

 The following picture will help you understand better how memory works in OpenCL:


As you can see, the Device has its own memory region. This means that it’s necessary to use specific OpenCL API commands to allocate memory in the device and read/write to it. Have in mind that you should keep data transfers to the MINIMUM POSSIBLE because this is very slow and currently is the bottleneck in many OpenCL applications (and in all other GPU computing technologies as well).

1. Creating and accessing variables with OpenCLTemplate

 This should be very easy and we’ve been doing it even before this topic. If you want to use OpenCLTemplate to create/write/read variables from the device, this is what you need to do:

  1. Initialize OpenCL using CLCalc.InitCL() (only once per execution);
  2. Create your array of type int, float, double, etc;
  3. Create a new CLCalc.Program.Variable from your array;
  4. Use the CLCalc.Program.Variable WriteToDevice to write to the Device;
  5. Use the CLCalc.Program.Variable ReadFromDeviceTo to read from Device to your array.

OpenCLTemplate checks the lengths of the arrays being read/written. You can create a variable from one array and read it to another one provided they have the same Length. If you want random access to read/write to specific positions and transfer different amount of data, keep reading further to see how to do it directly with the OpenCL API.

1.1 Example

 Create an int array on Device memory and read what has been stored to another int array.

//Initializes OpenCL
//Creates two int arrays of the same size.
//OpenCLTemplate checks these sizes
int[] myArray = new int
int[] myArray2 = new int
//Creates some data to myArray
for (int
 i = 0; i < myArray.Length; i++)
myArray[i] = i;

//Allocates memory in the OpenCL device and copies data
 varMyArray = newOpenCLTemplate.CLCalc.Program.Variable(myArray);

//Reads from device memory to myArray2

You can use a breakpoint to check that myArray2 now has what was stored in myArray. If you want to store floats or doubles, just use the type you need (and that is supported by OpenCLTemplate). I can’t see how to make this easier.

Additional info: OpenCLTemplate creates a Context with all available devices and uses the device of index CLCalc.Program.DefaultCQ to execute the commands. You can change DefaultCQ to the available devices in the list CLCalc.CLDevices.

If this memory allocation suits your needs, there is no need to continue reading. However, if you need to copy or read only chunks of data or transfer data in the Device memory, I suggest you keep going.

2. OpenCL API commands

 If you want to create C++ code or manipulate variables with the OpenCL API itself, you will need to call the functions below. Once again, refer to the Khronos Group OpenCL Spec for details. All OpenCLTemplate variables have a public CLMem VarBuff that you can access to use the API directly if you want to.

2.1 Memory allocation via API

 This is the function to allocate memory in the Device (in more technical terms, to create a buffer):

cl_mem clCreateBuffer (cl_context context, cl_mem_flags flags, size_t size, void *host_ptr, cl_int *errcode_ret)

The OpenCL context is the collection of Devices, Kernels, Programs and Memory Objects that are being used in OpenCL. Refer to the OpenCL Spec for more details.

This command allocates memory in the OpenCL context with certain properties. What is more important here is not to forget that size is not the array length, it’s Array.Length*sizeof(type of Array).

2.2 Accessing device memory via API

 These are the functions to read and write “Buffers” (variables) into the Device memory:

cl_int clEnqueueReadBuffer (cl_command_queue command_queue, cl_mem buffer,cl_boolblocking_read, size_t offset,size_t cb, void *ptr,cl_uint num_events_in_wait_list,  const cl_event *event_wait_list,cl_event *event)

cl_int clEnqueueWriteBuffer (cl_command_queue command_queue,cl_mem buffer, cl_boolblocking_write,size_t offset, size_t cb,const void * ptr,  cl_uintnum_events_in_wait_list,const cl_event * event_wait_list, cl_event *event)

They look very similar and the reason is they do the same thing in opposite directions. The command_queue is the Device you want to execute the read/write operation and also how it’s done (synchronous/asynchronous – refer to OpenCL Spec). Blocking_write is used to tell the Host to wait until the operation is complete or to continue working while the copy is happening. If the Host continues working, it is necessary to keep track of the data transfer status using the event. If you happen to need this you’re probably beyond the level of this tutorial. For now, OpenCLTemplate will block the Host until command completion.

Another interesting point is that the ReadBuffer will force thread synchronization, i.e., the Device will wait for all kernels to finish their job. This can be quite misleading because the novice OpenCL programmer might think that reading the variables is taking TOO long and the truth is, the Device was still processing data (this happened to me).

Another point of interest is: offset and cb. The cb argument tells OpenCL the size being allocated in bytes. Remember, for an array, this should be array.Length*sizeof(array data type). Offset is useful to get random access to the device memory. Even if you are using OpenCLTemplate you can always do it manually by accessing the public CLMem VarBuff.

A function that may be useful at times you need to replicate data in the Device memory is:

cl_int clEnqueueCopyBuffer (cl_command_queue command_queue, cl_mem src_buffer, cl_memdst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uintnum_events_in_wait_list,const cl_event *event_wait_list, cl_event *event)

This function copies a data region from one variable to the other. It is very handy when you want to read only a small portion of a variable that is stored in Device memory.

6 thoughts on “Reading and writing variables”

  1. Hi. You speak about set buffer params to kernel. Is it possible set scalar var via openCLTemplate ?? In cloo we used for that SetMemoryArgument

    1. Hello, Dmitry!

      I’m not sure if there isn’t a way to set a scalar variable in OpenCLTemplate, I guess there isn’t. And there is a good reason for it.
      Vector operations are roughly as fast as scalar operations but they do four times the work. This change single-handedly gives us almost 4x speed boost if you correctly parallelize your code.
      More information here:

  2. Hi Edmundo,
    Thanks very much for your good explanations, I have a dubt,
    I’m setting my kernel varibles as this:
    _args = new OpenCLTemplate.CLCalc.Program.Variable[] { _Matrix, _Result, _dLocArraiAngoliSen, _dLocArraiAngoliCos, Cos_Tilt_in_Rad, Sin_Tilt_in_Rad, _iTilt, _iSizeX, _iSizeY, _iSizeZ, _fDSO, _fDSD, OffsetV, _iOffsetU, m_fSizeVoxelX, m_fSizeVoxelY, m_fPixelSensoreDim, _IActualLoop, _ActualAxial, _numberOfAxial};

    this code is inside a loop and I need to lunch this only the first time; how can I change the varible values inside the kernel without sending all of them; I mean, is there a way in opencltemplate to change the value just for certain variables ?
    …this are my first steps in opencl…

    Thanks very much


    1. Dear Enrico;

      If we understood correctly, you have a loop of the form:

      //your loop here
      _args = new OpenCLTemplate.CLCalc.Program.Variable[]{ _Matrix, _Result, _dLocArraiAngoliSen, _dLocArraiAngoliCos, Cos_Tilt_in_Rad, Sin_Tilt_in_Rad, _iTilt, _iSizeX, _iSizeY, _iSizeZ, _fDSO, _fDSD, OffsetV, _iOffsetU, m_fSizeVoxelX, m_fSizeVoxelY, m_fPixelSensoreDim, _IActualLoop, _ActualAxial, _numberOfAxial};

      Using OpenCLTemplate, you can change arguments individually by doing this:

      _args = new OpenCLTemplate.CLCalc.Program.Variable[]{ _Matrix, _Result, _dLocArraiAngoliSen, _dLocArraiAngoliCos, Cos_Tilt_in_Rad, Sin_Tilt_in_Rad, _iTilt, _iSizeX, _iSizeY, _iSizeZ, _fDSO, _fDSD, OffsetV, _iOffsetU, m_fSizeVoxelX, m_fSizeVoxelY, m_fPixelSensoreDim, _IActualLoop, _ActualAxial, _numberOfAxial};
      //your loop here
      _args[0] = _MyNewMatrix;
      _args[1] = _MyNewResult;

      Alternatively, you can use Cloo or native OpenCL library calls to clSetKernelArg, which is a much more complicated alternative.

      1. Thanks Douglas very much,
        I appreciated, I tried but it didn’t work; my problem is that during the for loop the VRAM continuously increase until it gives the opencl out of resources; what really I don’t understand is why the memory continuously increase; if I pass always the same pointer; it looks like in C# when you have to force the gc.collect().

        my loop is like this:

        _IActualLoop = new OpenCLTemplate.CLCalc.Program.Variable(_SingleLoop);

        _Matrix = new OpenCLTemplate.CLCalc.Program.Variable(_GPU_4_Image);

        _args = new OpenCLTemplate.CLCalc.Program.Variable[]{ _Matrix, _Result, _dLocArraiAngoliSen, _dLocArraiAngoliCos, Cos_Tilt_in_Rad, Sin_Tilt_in_Rad, _iTilt, _iSizeX, _iSizeY, _iSizeZ, _fDSO, _fDSD, OffsetV, _iOffsetU, m_fSizeVoxelX, m_fSizeVoxelY, m_fPixelSensoreDim, _IActualLoop, _ActualAxial, _numberOfAxial};
        … read from device

        A bottle of “Valpolicella Superiore” to the Winner.
        Thanks a Lot


Leave a Reply

Your email address will not be published. Required fields are marked *