As we saw before, the memory of the Device that is executing OpenCL is not directly accessable by the Host. Thus, it is necessary to provide a way to transfer data back and forth. This involves three steps:
- Create the variable space in Device memory;
- Write the variable to Device memory;
- Read the contents back to Host memory.
The following picture will help you understand better how memory works in OpenCL:
As you can see, the Device has its own memory region. This means that it’s necessary to use specific OpenCL API commands to allocate memory in the device and read/write to it. Have in mind that you should keep data transfers to the MINIMUM POSSIBLE because this is very slow and currently is the bottleneck in many OpenCL applications (and in all other GPU computing technologies as well).
1. Creating and accessing variables with OpenCLTemplate
This should be very easy and we’ve been doing it even before this topic. If you want to use OpenCLTemplate to create/write/read variables from the device, this is what you need to do:
- Initialize OpenCL using CLCalc.InitCL() (only once per execution);
- Create your array of type int, float, double, etc;
- Create a new CLCalc.Program.Variable from your array;
- Use the CLCalc.Program.Variable WriteToDevice to write to the Device;
- Use the CLCalc.Program.Variable ReadFromDeviceTo to read from Device to your array.
OpenCLTemplate checks the lengths of the arrays being read/written. You can create a variable from one array and read it to another one provided they have the same Length. If you want random access to read/write to specific positions and transfer different amount of data, keep reading further to see how to do it directly with the OpenCL API.
Create an int array on Device memory and read what has been stored to another int array.
//Creates two int arrays of the same size.
//OpenCLTemplate checks these sizes
int myArray = new int;
int myArray2 = new int;
//Creates some data to myArray
for (int i = 0; i < myArray.Length; i++)
myArray[i] = i;
//Allocates memory in the OpenCL device and copies data
OpenCLTemplate.CLCalc.Program.Variable varMyArray = newOpenCLTemplate.CLCalc.Program.Variable(myArray);
//Reads from device memory to myArray2
You can use a breakpoint to check that myArray2 now has what was stored in myArray. If you want to store floats or doubles, just use the type you need (and that is supported by OpenCLTemplate). I can’t see how to make this easier.
Additional info: OpenCLTemplate creates a Context with all available devices and uses the device of index CLCalc.Program.DefaultCQ to execute the commands. You can change DefaultCQ to the available devices in the list CLCalc.CLDevices.
If this memory allocation suits your needs, there is no need to continue reading. However, if you need to copy or read only chunks of data or transfer data in the Device memory, I suggest you keep going.
2. OpenCL API commands
If you want to create C++ code or manipulate variables with the OpenCL API itself, you will need to call the functions below. Once again, refer to the Khronos Group OpenCL Spec for details. All OpenCLTemplate variables have a public CLMem VarBuff that you can access to use the API directly if you want to.
2.1 Memory allocation via API
This is the function to allocate memory in the Device (in more technical terms, to create a buffer):
cl_mem clCreateBuffer (cl_context context, cl_mem_flags flags, size_t size, void *host_ptr, cl_int *errcode_ret)
The OpenCL context is the collection of Devices, Kernels, Programs and Memory Objects that are being used in OpenCL. Refer to the OpenCL Spec for more details.
This command allocates memory in the OpenCL context with certain properties. What is more important here is not to forget that size is not the array length, it’s Array.Length*sizeof(type of Array).
2.2 Accessing device memory via API
These are the functions to read and write “Buffers” (variables) into the Device memory:
cl_int clEnqueueReadBuffer (cl_command_queue command_queue, cl_mem buffer,cl_boolblocking_read, size_t offset,size_t cb, void *ptr,cl_uint num_events_in_wait_list, const cl_event *event_wait_list,cl_event *event)
cl_int clEnqueueWriteBuffer (cl_command_queue command_queue,cl_mem buffer, cl_boolblocking_write,size_t offset, size_t cb,const void * ptr, cl_uintnum_events_in_wait_list,const cl_event * event_wait_list, cl_event *event)
They look very similar and the reason is they do the same thing in opposite directions. The command_queue is the Device you want to execute the read/write operation and also how it’s done (synchronous/asynchronous – refer to OpenCL Spec). Blocking_write is used to tell the Host to wait until the operation is complete or to continue working while the copy is happening. If the Host continues working, it is necessary to keep track of the data transfer status using the event. If you happen to need this you’re probably beyond the level of this tutorial. For now, OpenCLTemplate will block the Host until command completion.
Another interesting point is that the ReadBuffer will force thread synchronization, i.e., the Device will wait for all kernels to finish their job. This can be quite misleading because the novice OpenCL programmer might think that reading the variables is taking TOO long and the truth is, the Device was still processing data (this happened to me).
Another point of interest is: offset and cb. The cb argument tells OpenCL the size being allocated in bytes. Remember, for an array, this should be array.Length*sizeof(array data type). Offset is useful to get random access to the device memory. Even if you are using OpenCLTemplate you can always do it manually by accessing the public CLMem VarBuff.
A function that may be useful at times you need to replicate data in the Device memory is:
cl_int clEnqueueCopyBuffer (cl_command_queue command_queue, cl_mem src_buffer, cl_memdst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uintnum_events_in_wait_list,const cl_event *event_wait_list, cl_event *event)
This function copies a data region from one variable to the other. It is very handy when you want to read only a small portion of a variable that is stored in Device memory.