Get the source code for this case study. You will find 2 sources: one with a more easy to understand OpenCL C99 code and one with the C99 code optimized for the GPU.

I am developing this code for you OpenCL users who already have a good knowledge of the technology and want to see OpenCL being used for a real-life application.

Actually, this is being developed even before some of the more basic content because some of OpenCL programmers are probably much beyond the basics.

Also have in mind that non-engineers may need a big effort to understand this case study.

A side note on the configuration: if you did not set the TdrDelay and TdrDdiDelay to adjust the time Windows allows for your GPUs not to respond, you should do so. Refer to Installation and Configurations in the menu to the left.

## 1. The geometric fitting problem

Consider the following: you have a stack full of pipes which you would like to put together to form a long pipeline. The geometry of the pipe ends, though, may vary a lot, as follows:

The problem is: not all ends match. It is necessary to rotate some of the pipes before putting them together, like shown below, rotating the red pipe end:

So you want to know beforehand which pipes go best together and what should be the rotation that makes the pipes fit best.

In a more mathematical fashion, the problem is: consider a set of n pipes, each with its South and North end. Each pipe end is represented by a set of p points, equally spaced along its perimeter. For each pipe end pair calculate the lowest possible high-low misalignment and the rotation angle that should be applied to the loose end for it to match the fixed end. Consider that the measurements are taken by a person standing in front of the pipe end, scanning the data counter-clockwise.

The fact that the measurements are taken counter-clockwise brings an important detail to notice. The fact that the scanned data for two pipe ends is the same doesn’t mean that they will match. They should be mirrored in order to match. This is not so easy to explain. Take a look at the picture below:

In picture 1 we see front pictures of two pipe ends. They look exaclty the same. In 2, we start turning red pipe to fit with black one. In 3, we see that this is not correct.

In picture 4 we see front pictures of two MIRRORED pipe ends. In 5, we start turning red pipe to fit with black one. In 6, we see that this is the correct position.

## 2. How to use this program

Before reading this case study, you may want to use the software and see the speed boost OpenCL gives to the algorithm. This is the main screen:

This is how to use the program:

- Choose how many pipes you want to be generated and then be sorted in their best fit order. You can choose ellipses or random geometries;
- View the generated geometries. The straight line marks the first reading. The two pictures to the left show a FRONT view of the pipes and the rightmost picture shows a picture of how they would look like after fitting, which means mirroring the moving pipe like we discussed before;
- View how the pipes would look like when rotated by using the Rotation combobox;
- Use the combo box to select the pipes and click Calculate best fit to see how the best coupling looks like;
- Click Calculate using OpenCL and Calculate without OpenCL to calculate all high-low misalignments;
- Click Create sorted order to get a suggestion of what should be the sorting order. The result is a sorted list of pipe ends (of course, a pipe remains connected to itself).

I will show the acceleration results I got with OpenCL. It’s in the graph below:

For 400 pipes, the time to calculate the high-low misalignments without using acceleration was 496 s and the OpenCL times were 8,8 s on CPU and 2,7s on GPU optimized code. This is an acceleration of 183 times! And I have a decent Phenom II quad-core processor.

I hope a 183x acceleration is enough to encourage you into reading this **hard** example.

## 3. Getting started

The first step is to create a new project using Visual C#.

Since we don’t have real data we are going to create a button to generate data for testing and a box to ask the user how many “fake” pipe ends should be generated.

It’s also a good idea to create PictureBoxes to show the geometry of the pipes and show them match with rotation. The first picturebox will show the fixed end, the second will show the loose end and the third will show how the coupling is going to look like. We can also include a line that will mark the zero (0) index of the geometry data for debug purposes.

Then, we will need a button to calculate the fitting using regular coding and OpenCL code and a textbox to show useful information.

## 4. Generating and storing pipe data

### 4.1 Pipe class

It is useful to create a class to store all information and pipe data that is going to be used:

/// <summary>Class to store pipe data</summary> public class Pipe { /// <summary>Equally spaced radii measurement along the pipe perimeter, pipe end North</summary> public float[] RadiusNorth; /// <summary>Equally spaced radii measurement along the pipe perimeter, pipe end South</summary> public float[] RadiusSouth; /// <summary>DEBUG. This is to store what angle was used to rotate the ellipsis /// major axis along x-axis.</summary> public float MajorAxisAngle; /// <summary>This is the ovalization b-a, greatest radius minus smallest radius. North side</summary> public float OvalizationNorth; /// <summary>This is the ovalization b-a, greatest radius minus smallest radius. South side</summary> public float OvalizationSouth; }

This is what a pipe looks like for us. The float[] RadiusNorth and RadiusSouth store the pipe data that would be measured in a real life situation. The other variables are there to help us debug and make sure the algorithms are running fine. The Ovalization variables are there for us to keep track of difference between biggest and smallest radius of that particular pipe end. MajorAxisAngle is there to keep track of a random rotation we will apply when generating the data, to simulate the fact that we might not start taking measurements EXACTLY at an angle where the radius is the biggest.

The float[] RadiusNorth and RadiusSouth variables will have a fixed number of NUM_POINTS_PER_PIPEEND = 200 elements, each consisting on the measured radius in an angle of 360*j/200 degrees. This means that RadiusNorth[0] is the radius measured for an angle of 0 degrees at the north side of the pipe and RadiusNorth[80] is the radius measured at 144 degrees in a counter-clockwise fashion. Make sure you understand this part because we will use it a lot throughout this code.

As an example, check what would be the values of RadiusNorth[] for NUM_POINTS_PER_PIPEEND = 8 points in the example below, with bigger radius = 2 and smaller radius = 1:

### 4.2 Data generation

Now, how do we generate data? Back to mathematics, the equation of a circle is:

C (t) = [r cos(t), r sin(t)], 0 ≤ t < 2π

But we don’t want just circular pipes because they would fit in any rotation. What we want are ellipses:

E (t) = [b cos(t), a sin(t)], 0 ≤ t < 2π

In this equation b is the biggest radius and a is the smallest radius, which means the ovalization is b-a. Since we are not using real data, let’s just create pipes so that the first one has zero ovalization and the last one has ovalization of 3, with mean radius of 10.

The bigger radius b = r + Ovalization/2, and the smaller is a = r – Ovalization/2, so that b-a = Ovalization, like we defined before.

In a real application the data would be read from files but this controlled generation allows us to check the algorithm: if we compare pipe with ovalization O1 with pipe with ovalization O2 we know that the algorithm should return a minimum possible misalignment (high-low) of b1-b2 = r + O1/2 – (r + O2/2) = (O1-O2)/2. This will help us check our algorithm.

If we generate all ellipses with major axis aligned with the X axis the problem is going to become boring because all fits will say: zero rotation. So I’ve added a random rotation term to create a bigger problem. One more thing, I’ve inserted the pipes into the list of pipes randomly to make it more interesting for the sorting algorithm.

I won’t go into many details of this implementation here. If you want to know more about how this false data is created study the code in the “Generate” button.

The Pipe class is inside the Pipefitting class, which also contains the List<Pipe> Pipes that is populated in this algorithm.

### 4.3 Data display

To make the program easier to understand, the View Pipes part of the form show a visualization of the pipes. The straight line marks the first reading (zero-index) in the particular pipe. There is also a distinction to separe the North from the South end of the pipe. It’s possible to apply rotation to any of the pipes by using the Rotation combo. The two pictures to the left show a FRONT view of the pipes and the rightmost picture shows a picture of how they would look like after fitting, which means mirroring the moving pipe like we discussed before.

Drawing pipes is not the scope of this tutorial. If you are interested on drawing round geometries feel free to look at the Paint event of the pictureboxes.

## 5. Fitting algorithm

Now that pipe data has been generated we are ready to perform pipe fitting calculations. Let’s recall what we’ve got:

- The Pipe class containing geometry data for North and South ends of a given pipe;
- A list of pipes with geometry data stored.

I have created a Pipefitting class to store all this. Its first few lines are:

/// <summary>Class to store pipe data and calculate pipe fitting</summary> public class Pipefitting { /// <summary>Class to store pipe data</summary> public class Pipe { /// <summary>Equally spaced radii measurement along the pipe perimeter, pipe end North</summary> public float[] RadiusNorth; /// <summary>Equally spaced radii measurement along the pipe perimeter, pipe end South</summary> public float[] RadiusSouth; (DEBUG Data) } /// <summary>List of pipes to be fitted</summary> public List<Pipe> Pipes = new List<Pipe>(); (...)

And this is all we need to get started: just the generated pipe geometries.

### 5.1 Strategy

It’s necessary to think about a good approach to create the parallel algorithm in order not to waste memory resources and not to throw useless expensive threads. It is necessary to compare all possible couplings, which are nEnds*(nEnds-1)/2 total (ok I’m comparing pipe ends of the same pipe but that’s not too bad). These comparisons are completely independent and thus can be done in parallel.

For example, if we have 8 pipe ends (4 pipes), we need to compare the pipe ends i – j as follows:

0-1 0-2 0-3 0-4 0-5 0-6 0-7

1-2 1-3 1-4 1-5 1-6 1-7

2-3 2-4 2-5 2-6 2-7

3-4 3-5 3-6 3-7

4-5 4-6 4-7

5-6 5-7

6-7

We don’t need to get back and compare, say, 6 with 1 because we calculated this high-low in 1-6. Our strategy will be to calculate one high-low misalignment per worker, which means nEnds*(nEnds-1)/2 global work size.

### 5.2 Storing and accessing high-low values

So our variable HiLos[nEnds*(nEnds-1)/2] will have nEnds*(nEnds-1)/2 elements, but how do we access them? The question is: how to convert an [i,j] with j>i to an integer ind so that 0<=ind<nEnds*(nEnds-1)/2 ?

Looking back at the 8 ends (4 pipes) example, this is what we want:

0 1 2 3 4 5 6

0-1 0-2 0-3 0-4 0-5 0-6 0-7

7 8 9 10 11 12

1-2 1-3 1-4 1-5 1-6 1-7

13 14 15 16 17

2-3 2-4 2-5 2-6 2-7

18 19 20 21

3-4 3-5 3-6 3-7

22 23 24

4-5 4-6 4-7

25 26

5-6 5-7

27

6-7

We have 28 = 8*(8-1)/2 high-lows to calculate, as expected.

This is not a math tutorial, and I will give the formula to convert the [i,j] indexes below. Take your time to derive it if you want to:

ind(i,j) = nEnds * i – i * (i + 1) / 2 + j – i – 1

Let’s check: what is the index of the comparison of pipe ends 3 and 5? It’s

ind(3,5) = 8 * 3 – 3 * (3 + 1) / 2 + 5 – 3 – 1 = 24-6+5-3-1 = 19

This is as expected. Now the inverse problem: given an index i, what are the indices of the pipes to be compared? This is a bit more trickier and involves inverting the formula above. One more time, take your time to derive the formula if you want to, but here it is:

//Gets what ends should be compared in this worker float fixedEndTry = (float)nPipEnds - 0.5f - (0.5f) * (float)Math.Sqrt((float)(4f * (float)nPipEnds * nPipEnds - 4 * nPipEnds + 1 - 8 * i)); int fixedEnd = (int)Math.Floor(fixedEndTry); int movingEnd = i - nPipEnds * fixedEnd + fixedEnd * (fixedEnd + 1) / 2 + 1 + fixedEnd;

Check this formula if you want to, it does the trick. I was afraid to get roundoff errors by using the floor function so I added a safety code just in case roundoff errors kick in and round a value of i that was calculated 2.9999 to 2:

//avoids roundoff error if (i != nPipEnds * fixedEnd - fixedEnd * (fixedEnd + 1) / 2 + movingEnd - fixedEnd - 1) { fixedEnd++; movingEnd = i - nPipEnds * fixedEnd + fixedEnd * (fixedEnd + 1) / 2 + 1 + fixedEnd; }

I breakpointed inside the if and got no hits but it’s there just in case.

### 5.3 Variables needed

This is what we have to send as information:

- The pipe ends geometry;
- How many pipe ends there are;
- How many poins per pipe end there are;

And we get this information in return:

- The high-low misalignment of pipe ends i and j;
- The rotation needed to fit pipe i to pipe j.

The rotation is not really necessary in this example and it’s not going to be used. It’s there just in case you decide to create fitting conditions based on it.

The first data we will need to pass to an OpenCL algorithm will be the pipe ends geometries. This is the pipeEnds variable:

//Creating the float[] array to store all pipe end data //Have in mind that each pipe has two ends. //Even indexes contain North information //Odd indexes contain South information //pipeEnds[j + nPoints*(2*i)] -> 0<=j<nPoints radius information for North side of pipe i //pipeEnds[j + nPoints*(2*i+1)] -> 0<=j<nPoints radius information for South side of pipe i float[] pipeEnds = new float[nEnds * nPoints]; for (int i = 0; i < Pipes.Count; i++) { for (int j = 0; j < nPoints; j++) { pipeEnds[j + 2 * i * nPoints] = Pipes[i].RadiusNorth[j]; pipeEnds[j + (2 * i + 1) * nPoints] = Pipes[i].RadiusSouth[j]; } }

As you see, this is a plain array of floats representing pipe end geometries. nEnds is the number of pipe ends, i.e., 2*Pipes.Count. nPoints is NUM_POINTS_PER_PIPEEND = 200, defined prior to generating the test data.

We also need to create the variables that will give the answers back:

//Dimensioning of the vectors HiLos = new float[nEnds * (nEnds - 1) / 2]; RotationAngles = new float[nEnds * (nEnds - 1) / 2];

The next step is to copy these variables into the Device memory, create the argument array and tell OpenCL that the global work size is nEnds * (nEnds – 1) / 2:

//Generate OpenCL variables //Output varHiLos = new CLCalc.Program.Variable(HiLos); varRots = new CLCalc.Program.Variable(RotationAngles); //Input varPipeEnds = new CLCalc.Program.Variable(pipeEnds); varnPipeEnds = new CLCalc.Program.Variable(nPipeEnds); varnPtsPerEnd = new CLCalc.Program.Variable(nPtsPerEnd); //set Kernel arguments args = new CLCalc.Program.Variable[] { varPipeEnds, varHiLos, varRots, varnPipeEnds, varnPtsPerEnd }; //One worker for each pair to compare //work_size int[] workers = new int[1] { nEnds * (nEnds - 1) / 2 };

Let’s procceed into the creation of the kernel.

### 5.4 OpenCL code

The tools to debug and solve errors in Visual C# are better than the tools to debug the OpenCL code, so I strongly suggest you to write a C# function in a OpenCL style before porting the code PLUS you can compare execution times.

Remember that there are nEnds * (nEnds – 1) / 2 and consider the job that is going to be done by the i-th worker: it is going to calculate the high-low of pipe ends with indexes fixedEnd and movingEnd. The first part of the algorithm is to receive the necessary data and calculate the pipe end indexes (refer back to 5.2 for the formulas):

__kernel void CalcMisalignments(__global read_only float * pipeEnds, __global write_only float * HiLos, __global write_only float * Rots, __global read_only int * nPipeEnds, __global read_only int * nPtsPerEnd) { int n = get_global_size(0); int i = get_global_id(0); int nPip = nPipeEnds[0]; int nPtsEnd = nPtsPerEnd[0]; //Gets what ends should be compared in this worker float fixedEndTry = (float)nPip - 0.5 - (0.5) * (float)sqrt((float)(4 * (float)nPip * nPip - 4 * nPip + 1 - 8 * i)); int fixedEnd = (int)floor(fixedEndTry); int movingEnd = i - nPip * fixedEnd + fixedEnd * (fixedEnd + 1) / 2 + 1 + fixedEnd; //avoids roundoff error if (i != nPip * fixedEnd - fixedEnd * (fixedEnd + 1) / 2 + movingEnd - fixedEnd - 1) { fixedEnd++; movingEnd = i - nPip * fixedEnd + fixedEnd * (fixedEnd + 1) / 2 + 1 + fixedEnd; } }

For each possible rotation (out of nPtsEnd possible), it’s necessary to calculate the maximum high-low of that particular rotation. Then, we need to choose the rotation that yields the smallest high-low value. The tricky part is that the pipes are measured counterclockwise with the operator standing in front of them. Look at the picture:

Notice that we can’t compare 0 with 0, 1 with 1, etc. because the moving pipe gets mirrored in the coupling. Instead, we compare fixed 0 with moving 4, fixed 1 with moving 3 and so on. If the moving pipe is rotated counterclockwise (in the front view), point 7 will be where point 0 used to be. We compare fixed 0 with moving 3, fixed 1 with moving 2 and so on. In the OpenCL code, variable ii is the rotation of the moving pipe and k is the index to find which points of the moving pipe should be compared with the incresing values in the fixed pipe. Try running the code below manually if things get difficult:

//COMPARE POINTS //Initialize minHiLo float MinHiLo = 1000; float MovingRotationAngle = 0; for (int ii = 0; ii < nPtsEnd; ii++) { int k = nPtsEnd / 2 - ii; if (k < 0) k += nPtsEnd; //maximum high-low for this angle float angMaxHiLo = -1; for (int jj = 0; jj < nPtsEnd; jj++) { float temp = fabs(pipeEnds[jj + fixedEnd * nPtsEnd] - pipeEnds[k+movingEnd*nPtsEnd]); if (temp > angMaxHiLo) { angMaxHiLo = temp; } k--; if (k < 0) k = nPtsEnd - 1; } if (angMaxHiLo < MinHiLo) { MovingRotationAngle = 6.28318530717 * (float)ii / (float)nPtsEnd; MinHiLo = angMaxHiLo; } }

Now we store the information:

// Store result HiLos[i] = MinHiLo; Rots[i] = MovingRotationAngle;

And this is it for the OpenCL code. This OpenCL C99 code is stored in the private classPipefittingOpenCLSource, inside the Pipefitting.cs class.

### 5.5 Optimizing the code for GPU

You can find a second, optimized, code. This optimization uses the GPU greater capability to sum vectors using built-in hardware. Instead of calculating the radius differences individually, we’ll calculate them 4 by 4. This requires the number of samples to be a multiple of 4, which is no big deal.

It is very important to understand that vector operations are roughly as fast as scalar operations but they do four times the work. This change single-handedly gives us almost 4x speed boost.

The code below shows how to do it, pay attention to the differences:

//COMPARE POINTS //Initialize minHiLo float MinHiLo = 1000; float MovingRotationAngle = 0; for (int ii = 0; ii < nPtsEnd; ii++) { int k = nPtsEnd / 2 - ii; if (k < 0) k += nPtsEnd; //maximum high-low for this angle float angMaxHiLo = -1; for (int jj = 0; jj < nPtsEnd; jj+=4) { float4 fFix = (float4)(pipeEnds[jj + fixedEnd * nPtsEnd], pipeEnds[jj+1 + fixedEnd * nPtsEnd], pipeEnds[jj+2 + fixedEnd * nPtsEnd], pipeEnds[jj+3 + fixedEnd * nPtsEnd]); float4 fMov; fMov.x = pipeEnds[k + movingEnd * nPtsEnd]; k--; if (k < 0) k += nPtsEnd; fMov.y = pipeEnds[k + movingEnd * nPtsEnd]; k--; if (k < 0) k += nPtsEnd; fMov.z = pipeEnds[k + movingEnd * nPtsEnd]; k--; if (k < 0) k += nPtsEnd; fMov.w = pipeEnds[k + movingEnd * nPtsEnd]; k--; if (k < 0) k += nPtsEnd; float4 temp = fabs(fFix - fMov); if (temp.x > angMaxHiLo) angMaxHiLo = temp.x; if (temp.y > angMaxHiLo) angMaxHiLo = temp.y; if (temp.z > angMaxHiLo) angMaxHiLo = temp.z; if (temp.w > angMaxHiLo) angMaxHiLo = temp.w; } if (angMaxHiLo < MinHiLo) { MovingRotationAngle = 6.28318530717 * (float)ii / (float)nPtsEnd; MinHiLo = angMaxHiLo; } }

As you can see, we step 4 values in jj each time, storing 4 values of the fixed pipe. Then, we apply rotation using k– and store each value in one component of fMov. All 4 subtraction and absolute values are done by using float4 temp = fabs(fFix-fMov).

### 5.6 Creating and executing the kernel

We compile the program and create the kernel in the class constructor. This is good because we don’t want the program to be recompiled every time we run it.

/// <summary>OpenCL code to calculate misalignments</summary> CLCalc.Program.Kernel kernelCalcMisalignments = null; /// <summary>Constructor.</summary> public Pipefitting() { if (CLCalc.GLAcceleration == CLCalc.GLAccelerationType.Unknown) CLCalc.InitCL(); if (CLCalc.GLAcceleration == CLCalc.GLAccelerationType.UsingGL) { PipefittingOpenCLSource PipefittingOpenCL = new PipefittingOpenCLSource(); CLCalc.Program.Compile(new string[1] {PipefittingOpenCL.src}); kernelCalcMisalignments = new CLCalc.Program.Kernel("CalcMisalignments"); } }

Now that the code is ready and all the variables are set, we can call the kernel.Execute when the user wants to calculate high lows.

Refer to the CalculateHighLows function in the Pipefitting class for the complete code. This is the portion related to the kernel execution:

//One worker for each pair to compare //work_size int[] workers = new int[1] { nEnds * (nEnds - 1) / 2 }; //Execute OpenCL kernelCalcMisalignments.Execute(args, workers); //Reads outputs varHiLos.ReadFromDeviceTo(HiLos); varRots.ReadFromDeviceTo(RotationAngles);

### 5.7 Calculating the sorted order

Calculating the sorted order should be done specifically for each project because of high-low requirements. The one I created is just a suggestion of how it could be done. It shouldn’t be difficult for a user to create an algorithm that fits his needs now that the high-low misalignment values are available.

Refer to the function List<int> SortPipes(out float WorstHiLo) in Pipefitting.cs class for the complete implementation. This function returns a list of pipe ends in the order they should be put together. Of course, the north and south ends of the pipes remain connected.

The idea is the following:

- Get the first pipe;
- For all the remaining pipe ends: find the one with the smallest high-low to connect to the beginning of the string and the one with the smallets high-low to connect to its end;
- Choose the smallest high-low and connect it to the string (add the best pipe end and the other end of the same pipe);
- Repeat steps 2 and 3 until there are no more pipes left.

## 6. Conclusion

Congratulations for reaching the conclusion. It’s time to play with the code and discover how much it can accelerate the algorithm on your hardware. If you have the time and patience to measure your own execution times, you may send them to me.

We have shown a way to use OpenCL to calculate pipe fittings. It is important to notice that the high-low calculations are independent, which allows us to parallelize the algorithm and take advantage of GPUs processing power via OpenCL.

Both my CPU and GPU gave me an acceleration of 183x on average. Using the GPU vector calculation capabilities is very important for better acceleration.

Get the source code for this case study. You will find 2 sources: one with a more easy to understand OpenCL C99 code and one with the C99 code optimized for the GPU.