Get the source code for this section.

Parallel algorithms aren’t usually fully parallel; they normally involve big parts of code which can be parallelized.

This tutorial presents OpenCL Host code synchronization techniques to ensure that kernels that depend on previous operations effectively wait until the data they need is actually available. If you want to know about synchronization using OpenCL C99’s barrier, please go to the Advanced aspects of OpenCL C99 tutorial. Atomics extensions are also a very important aspect and they will be covered in the Atomics Extensions topic.

The source code presents code examples featuring synchronization methods that may or may not work under current OpenCL driver structure:

1. In-order queues

The simplest synchronization method is to simply use a single in-order command queue. The command queue will enqueue and execute commands in order. Currently, considering the early stages of OpenCL drivers, this should be the implementation of choice.

Notice that using the clFinish command (in OpenCLTemplate: CLCalc.Program.Sync(), in Cloo: CommandQueue.Finish()) forces synchronization. Reading variables with EnqueueReadBuffer() also forces synchronization.

This means that, if you want to use a watch to take kernel execution times, you can’t do this:

System.Diagnostics.Stopwatch sw=new System.Diagnostics.Stopwatch()

What you should do is:

System.Diagnostics.Stopwatch sw=new System.Diagnostics.Stopwatch()
// Finish command queues
// Finish command queues

Because without sync you will measure the Host enqueue time (very low). Not using glFinish before calling the kernel means you will measure the execution time of the desired kernel plus any pending tasks (will take more time).

2. Out-of-order queues

Currently, OpenCL drivers will treat out-of-order command queues as ordinary in-order command queues. At some point in the future command queues will be able to run asynchronously and will prove to be a very interesting choice.

2.1 Enqueuing barrier

It is possible to ensure synchronization inside a given Command Queue by enqueuing a barrier using glEnqueueBarrier function from the API (CommandQueue.Barrier in Cloo). Remember that this is going to sync commands issued to that specific command queue!

2.2 Using Events

The most powerful yet most difficult to manage synchronization technique is to use Events. Each command issued to a Command Queue generates an Event which will trigger upon completion. When you pass a list of prequerisite events to a CommandQueue execution, it will wait until all Events are complete to execute that specific command.

The problem is that you need to manage all the Events in the Host code. A simple way to go is to create a list of Events and add each prerequisite to the end of the list as they are executed. This is the way Cloo works and it’s a very interesting concept.

This would be the best method to sync kernel execution among different devices. Unfortunately, not all OpenCL drivers support this, and I advise you to skip this method for a while if you want to create a widely compatible software.

3. Example

I have prepared a simple example to show the concepts mentioned above. Right now, the only widely available Host synchronization method is to use in-order queues, which are relatively simple. This is why I decided not to include much source code in the tutorial itself. The source code for this section includes a simple example of a kernel that sums the current value of a vector with its neighbor. For example:

1  2  3  4  5
3  5  7  9  5
8 12 16 14  5

Notice that the next iteration will always depend on the previous. Also notice that some methods supposed to work may not work and vice-versa.

Get the source code for this section.

4 thoughts on “Synchronization”

  1. Hello, guys. One question – do we have good way for debugging OpenCL kernels ? I mean if I want see data change on GPU side waht I should use ?

      1. Hai Edmundo. Thanks a lot. CodeXL is good and really usefull for me. But in CodeXL I see new keyword – “wavefront”. I haven’t met it before. Do u know what does it mean? As I inderstand it is similar work_group…..but I’m not sure (((

Leave a Reply

Your email address will not be published. Required fields are marked *