Heat transfer and, more generally, parabolic partial differential equations are a very important class of problems in physics and mathematics. Analytic solutions aren't available for real-world problems where initial and boundary conditions can be arbitrary.

In this Case Study, CMSoft presents OpenCL implementation using C# to demonstrate how to generate simulations of the heat equation with Dirichlet conditions. In our tests, we achieved 7000 steps per second when simulating a 500x400 temperature grid.

The image below shows the simulation output at a given instant:

The video below demonstrates the use of the software. The reader is welcome to watch the full video tutorial on YouTube (links in the beginning of the page).

In this case study, we are going to present a practical example and demonstrate how to achieve very high performance in the simulation of the heat equation using C#, OpenCL computation and OpenCL/OpenGL interoperation. Our goal is to solve the partial differential equations of heat conduction using finite differences discretization to present a real-time heat conduction simulation.

We also point out that the same technique can be used to solve a variety of other PDE's such as the wave equation, with very few changes.

2. Heat equation and discretization

2.1 The heat equation

The heat equation is a parabolic PDE that describes how heat is transmitted in a rigid body. Intuitively we know that, when some part of a metal plate is hot and the rest is cold, heat flows from the hot portion to the cold portion. But how fast is this process? It is known that the speed of heat transfer depends on the conductivity of the metal and on the temperature difference between hot and cold portions. If you are interested in the details of the physics, please check Wikipedia reference [1].

The heat equation is given below:

Eq. 1

This equation is the mathematical formalization of the intuition that it depends on the material (coefficient alpha) and on the laplacian of the temperature distribution function u. There are known solutions to special cases of the heat equation, but any changes in boundary conditions renders those equations insoluble. We can, however, resort to numerical techniques such as finite differences.

2.2 Discrete approximation

Let's revisit the heat equation and discretize its components. Let's expand Eq. 1:

Eq. 2

To first order accuracy, using Taylor's expansion, we can expand the derivative of u with respect to t as:

Eq. 3

Similarly, we can expand the second derivatives of u with respect to x and y:

Eq. 4

Notice that we're interested in obtaining the values of u(t+Δt) from our current knowledge of u(t,x,y) everywhere. We finally obtain the discretized heat equation:

3. Displaying the results using OpenCL/OpenGL interop

Let's take a small detour from the heat equation itself and discuss how to display simulation results. As we simulate the time evolution of the heat equation, be it using OpenCL or in our C# host code, we would like to display the results onscreen. Suppose we're generating a color image that displays graphically the temperature of a point using color. If the image has dimensions 300x400 and we want 60 frames per second, that would involve sending (300*400*4) bytes per image * 60 fps = 27.4 Mb of data per second to the GPU. While modern PCI-Express connections can handle this without much trouble, the OpenCL OpenGL interoperability enables us to avoid this step and all overheads involved in doing so.

3.1 Creating the shared context

First of all, we need to create an environment that lets us share OpenGL objects with OpenCL. Using OpenCLTemplate.CLGLInterop, this is quite easy to achieve:

It's possible to share OpenGL/OpenCL contexts using Cloo (which serves as basis for OpenCLTemplate) or native OpenCL, but it requires a significant effort. Our OpenCL OpenGL interop tutorial [3] has more details about the topic.

3.2 Displaying and retrieving the texture

In order to display a texture in the screen, we need to create a surface in OpenGL 3D space, which we will achieve by creating a 3D model using Vertex Buffer Objects (VBOs):

#region Texture holder

//Integer values of width and height, defined statically previously in code //h = 400; //700; //w = 500; //750;

//reduces values of width and height proportionally (division by 8) to fit them onto the screen float[] v = new float[] { -w>>3,-h>>3,0, -w>>3, h>>3,0, w>>3, h>>3,0, w>>3,-h>>3,0 };

float[] n = new float[] { 0,0,1, 0,0,1, 0,0,1, 0,0,1 };

//my quad uses points 0,1,2,3 as vertexes int[] elem = new int[] { 0, 1, 2, 3 };

GLRender.GLVBOModel m = newGLRender.GLVBOModel(OpenTK.Graphics.OpenGL.BeginMode.Quads); m.SetVertexData(v); m.SetTexCoordData(texCoord); m.SetNormalData(n); m.SetElemData(elem);

Bitmap bmp = newBitmap(w, h); m.SetTexture(bmp);

GLr.Models.Add(m);

#endregion

//CLGL shared texture

CLCalc.Program.Image2DCLimg = m.GetCLTexture2D();

The code above creates the four points of the rectangle which will contain the texture, defines the texture coordinates of each of the points, normals and uses the 4 points to create an OpenGL quad element. There's plenty of information about OpenGL Vertex Buffer Objects online. You can also check our OpenCLTemplate OpenCL/GL Interop Framework.

The last line uses OpenCLTemplate.CLGLInterop framework to retrieve the texture as an OpenCL image2D object for later use.

3.3 Mapping colors to the texture

In this example, we'll need to map colors to the texture in order to display temperature information. Fortunately, we can use the shared texture CLimg that we retrieved previously and modify it in Device memory. The dummy code "Preliminary Tests" shows how to manipulate the texture and set a 50x50 pixel square to red directly in the GPU:

#region Preliminary tests string src = @" kernel void imgTeste(__write_only image2d_t bmp) { int x = get_global_id(0); int y = get_global_id(1); int2 coords = (int2)(x,y);

float4 val = (float4)(0.0f,0.0f,1.0f,1.0f);

write_imagef(bmp, coords, val); }

"; CLCalc.Program.Compile(src); CLCalc.Program.Kernel k = new CLCalc.Program.Kernel("imgTeste");

CLGLInteropFunctions.AcquireGLElements(CLimg);

k.Execute(newCLCalc.Program.MemoryObject[] { CLimg }, new int[] { 50, 50 });

Now, supposing the system temperatures vary from hMin to hMax, it's necessary to convert any intermediate value t into a RGB color code that graphically displays those temperatures. The function below performs this task associating colors from white and red (hot) to green and blue (colder) to a given intermediate value t:

We now have the tools to output a color image from intensity values. Now it's necessary to compute the actual temperature values from the heat equation solution, feed those values to the getColor function and use the corresponding colors to paint the texture.

4. Parallel solution of the PDE

We now turn back to the problem of computing the simulation of the heat equation. Let's consider W to be the width of the grid and H to be the height of the grid. Let's create two matrices: one to store the current temperature values and another one to store the next temperature values, in the next instant of time. In OpenCL we represent matrices as vectors:

Details of the initializations can be found in the source code, class CLHeatTransfer.cs, public CLHeatTransfer() constructor.

4.1 Boundary and initial conditions

Recalling Eq. 4:

Eq. 4

Notice that, in order to compute the next value of u(t,x,y), it's necessary to know the current point temperature (the initial condition) as well as the temperature of its neighbors. However, when x = 0 or x = Width-1 or y = 0 or y = Height - 1, we're unable to do that because we would go off the bounds of the region. Thus, in order to solve this problem, we need to know u(0,x,y), i.e., the initial temperature distribution everywhere, and we also need information along the boundaries for all values of time. For the purposes of this simulation, we will consider that the temperatures along the edges where x = 0 or x = Width-1 or y = 0 or y = Height - 1 are FIXED VALUES (Dirichlet conditions), as shown in the picture below:

At this point all tools that will be needed to display the results of the simulation are ready, and all conditions necessary to integrate the heat equation PDE are in place. The OpenCL C99 kernel used to perform one integration step is given below:

Notice that the central temperature is stored in variable cOrigVal for later use, and four extra values are loaded in order to compute the partial derivatives with respect to space, as demonstrated in the sketch below. To update the value [x,y] it's necessary to load four of its neighbors to compute uxx and uyy:

For simplicity and compatibility with OpenCL 1.0, the central point each workitem works on is the point [x+1, y+1]. This way, the global_work_size = [Width-2, Height-2]. After storing the updated value we use the color generation function from section Mapping colors to texture to produce the visual display of temperature distribution.

4.3 Implementation details

There are some implementation details that are used to make the code run faster. Make sure to browse through class CLHeatTransfer.cs from the source code if you want to study the optimization aspects of the code. The main aspects are the following:

The steps are executed in pairs. This is because we want to swap the buffers containing current and previous data so that it's not necessary to allocate/free variables;

The getColor function is quite expensive insofar as it contains a great amount of if-then-else statements. To mitigate the slowdown of the color mapping, two OpenCL kernels are created: heatStep and heatStepImgWrite. The image write one is only called ONCE per step request;

Parameters such as the time step, space step etc. are allocated in __constant memory. The picture below, by Khronos Group, demonstrates the model:

Since the amount of parameter data is very small and is used by every workitem the best choice is to allocate it in __constant memory.

It would be possible to cache some values into __local space. We're relying on the __global const directive to get the OpenCL implementation to do a good job at caching values for us. The reader is welcome to try to implement this optimization;

The reader is also welcome to look at the C# project if interested in the details of how to change the temperature using the mouse and how to change conductivity on the fly.

5. Conclusion

In our tests, a 500x400 grid with fixed temperatures at the boundaries runs at roughly 7000 iterations per second, which is a remarkable improvement when compared to 2 iterations per second of some other implementations which don't use the GPU as OpenCL Device. We've introduced a color function that maps intensity values to color values and showed how to perform the coloring directly in Device memory, eliminating the need to transfer data back and forth from the GPU Device.

Finally, source code was presented demonstrating how to implement OpenCL C99 code to solve and simulate the heat equation using C# as host code by implementing a parallel version of finite differences.