OpenCL Tutorial with OpenCLTemplate and Cloo
Welcome to the area dedicated to parallel processing and acceleration by using OpenCL and graphics card.
This area intends to supply, in a summarized and clear way, a practical way to use the graphics card for math calculations. If you’re interested about the architecture and implementation, check the OpenCL spec from Khronos Group.
I suggest the reader NOT to skip any step because understanding later steps will often depend on having understood the previous ones. Besides, this page is not intended to give professional training about parallel processing. Instead, we’re trying to offer a practical way of learning for the non-professional OpenCL developer.
For your convenience, the topics have been grouped by difficulty level in a color scale:
Remember that the developer should be familiar with C# and .NET to read this tutorial.
The sample code for each section is available in the section.
I suggest you download OpenCLTemplate and use the OpenCL Editor to check if your code is correct:
Important note: Most of this tutorial is general-purpose information about OpenCL. OpenCLTemplate just makes it faster to try the code and see what happens. It doesn’t matter if you are going to use the pure OpenCL API or some binding like OpenTK, Cloo (which I think is great) or OpenCL.NET. What is important is that there will be commands to load variables and execute kernels. You will always be able to use the OpenCL C99 code presented here.
You may click on the desired topic or use the menu to the left to access the topics.
1 – Installation and configurations;
2 – Overview about OpenCL and parallel processing;
3 – First OpenCL program;
4 – ATI Stream OpenCL Technical Overview;
5 – Capabilities and limitations;
6 – Why parallel processing?;
7 – Reading and writing variables;
8 – Command queues;
9 – Kernel execution structure;
10 – Basic aspects of OpenCL C99 language;
11 – Intermediate aspects of the C99 OpenCL language;
12 – Advanced aspects of the C99 OpenCL language;
13 – OpenCL C99 Atomics;
14 – OpenCL Image2D Variables;
15 – Synchronization;
16 – OpenCL/OpenGL Interop Framework;
17 – OpenCL/OpenGL Interoperation;
18 – OpenCL/OpenGL Interoperation with Textures;
19 – Optimization Strategies;
20 – Case study: matrix multiplication;
21 – Case study: image filtering;
22 – Case study: Low poly collision detection;
23 – Case study: geometric fitting of pipes;
24 – Case study: color tracking;
25 – Case study: High performance convolution using OpenCL __local memory;
26 – Case study: Extraction of color Haar features;
27 – Case study: heat transfer simulation using CLGL interop;
28 – Case Study: Efficient manipulation of Kinect data using OpenCL/GL Interop.
AMD Diagonal Sparse Matrix Vector Multiplication Case Study – Nice Case Study from AMD, definitely worth seeing. Some optimizations are clearly hardware-specific though.
AMD Reductions Case Study – Another interesting case studies which show ways to compute vector sums/max/min operations efficiently. Concepts are applicable in general.