Login Form



Case study: image filtering

Case study: image filtering

 

Get the source code for this section.

Get source code of runtime execution image filter

 

In this tutorial we're going to implement image filtering with a 7x7 filter. The idea is to present an OpenCL algorithm which will work on all cards compatible with the basic implementation of OpenCL without extensions. Since we received some emails asking about how to implement image filters that don't use images and do basic stuff liike inverting colors, we also posted a simple image filter in which the user can compile the filter in execution time and choose which device will be used to compile the code. All these features can be seen in the following video:

It is very important to understand that we are NOT going to implement the fastest possible filter, but rather a general-purpose one that should work on all GPUs compatible with OpenCL. There are not many softwares out there which use GPU processing to speed up image processing and I assume that manufacturers don't want platform-specific algorithms or versions that will run only on a limited number of cards.

Well, good performance can be obtained using solely OpenCL basic implementation. This means that the following features, which could be used to boost performance, will not be present:

- Byte addressable stores;
- OpenCL images;
- OpenCL/OpenGL interoperation to manipulate and display images.

Either way, if you have a GPU which is compatible with the above techniques, you may want to optimize the code presented to a code that better suits your hardware. What we WILL use is transfer data to the GPU using bytes (OpenCL uchar), because data transfer is a bottleneck and this optimization really makes a difference.

In the end of this work I expect to convince the reader that using OpenCL can provide reasonable performance increase to applications even in computers that don't have the most powerful GPUs.

I have used AForge video components in this tutorial. It is open source and the license can be found here.

 

1. Screenshots and benchmarks 

 

In order to make this more interesting, real-time webcam image processing has been implemented along with regular image filtering. Using my hardware, the OpenCL version of the filter was 60x faster than the regular implementation. I have included a (slower) version of the algorithm that runs using work dimension = 1. It is not discussed in this tutorial but it may serve as reference. In my tests I could get a FPS of around 13.

 

Usage of the software:

- Use the Filter icon to modify the filter that is going to be applied. Each color has its own filter. You may replicate a filter to all colors; 
- Load a picture or start the webcam;
- Use the buttons to apply the desired filter. When using a webcam, you can modify the filter in real-time. I suggest testing the following filters:

http://www.student.kuleuven.be/~m0216922/CG/filtering.html

http://www.gamedev.net/reference/programming/features/imageproc/page2.asp

 

 

2. Image filtering basics

 

For you unfamiliar with image filtering, I will give a very brief explanation. A filter is a series of math calculations that can be done in an image to create effects and consists on interpreting the image as a series of red/green/blue values and replacing the central pixel of the image with values that depend on the pixel's surroundings. Take a look at the picture below for a quick reminder:

This tutorial is not intended to explain details of filters or the effect they create. Take a look at these references for further information:

http://www.student.kuleuven.be/~m0216922/CG/filtering.html

http://www.gamedev.net/reference/programming/features/imageproc/page2.asp

 

3. Setting up the filter

 

Most applications don't require a big filter and filtering time is highly dependant. 3x3 filters usually do fine, 5x5 filters will solve almost all practical problems and it is very unusual to see anything above 7x7 filters in a real application. In this tutorial, we will stick to a 7x7 filter that will still be processed real-time. Ok, the frame rate is not great but the result is still decent. The input screen has been created using C#. It is possible to create color-specific filters and copy/paste the filters in the format in the textbox below the filter, like shown in the screen below.

 

 

 

You may look at the code implementation if you want to. It's just an interface so discussing it is off-topic (not OpenCL related). As you can see, it's a 7x7 filter setup.

The most obvious way to make the filtering faster is to reduce filter size, hard-code the filter values into the code and take advantage of filter symmetries. This is not the case here since the filter is dynamic.

 

4. OpenCL Kernel

 

Let's create a two-dimensional kernel to solve the problem. We want to filter and retrieve an image with colors. The data structure is:

Filter[3*(i*FILTERSIZE + j)] is the red component of pixel i,j;
Filter[3*(i*FILTERSIZE + j)+1] is the green component of pixel i,j;
Filter[3*(i*FILTERSIZE + j)+2] is the blue component of pixel i,j;

Same logic for the Filter and Filtered image:

 

kernel void ImgFilter(global uchar * image,
                      global float * Filter,
                      global float * FilteredImage,
                      global int * Width)
                     
{
   int x = get_global_id(0);
   int y = get_global_id(1);
   int w = Width[0];
   int ind = 0;  
   int ind2 = 0;  

   float4 filteredVal = (float4)(0,0,0,0);
   for (int i = 0; i < FILTERSIZE; i++)
   {
       for (int j = 0; j < FILTERSIZE; j++)
       {
           ind = 3*(x+j + w*(y+i));
           ind2 = 3*(i*FILTERSIZE + j);
           filteredVal.x =  mad(Filter[ind2] ,   (float)image[ind],  filteredVal.x);
           filteredVal.y =  mad(Filter[ind2+1] , (float)image[ind+1],filteredVal.y);
           filteredVal.z =  mad(Filter[ind2+2] , (float)image[ind+2],filteredVal.z);
       }
   }
   ind = 3*(x+CENTER + w*(y+CENTER));
   FilteredImage[ind] = clamp(filteredVal.x,0,255);
   FilteredImage[ind+1] = clamp(filteredVal.y,0,255);
   FilteredImage[ind+2] = clamp(filteredVal.z,0,255);
}

 

It is possible to notice some relevant optimizations:

- The image argument is being sent by using uchars (c# byte);
- MAD optimization to compute a*b+c;
- Indexes calculated only once.

 

5. Host Code 

 

The host code contains two parts: copying the image to a byte array and processing the image using OpenCL.

 

5.1 Copying C# image into a byte array

 

We want to transfer the RGB values of the picture as bytes, not floats. Doing this allows us to transfer 1/4 of the data because sizeof(float) = 4 and sizeof(byte)=1. This part contains C# bitmap lockbits functions which you may want to study if you are not familiar with it. Remember the data structure being created: the byte array has to carry all 3 (RGB) components.

Full implementation is provided in ImageData class:

 

/// <summary>Copies bitmap data to local Data</summary>
/// <param name="bmp">Bitmap to copy</param>
private void ReadToLocalData(Bitmap bmp)

{

//Lock bits
BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height),
System.Drawing.Imaging.
ImageLockMode.ReadOnly, bmp.PixelFormat);

//Read data

unsafe

{

for (int y = 0; y < bmd.Height; y++)

{

byte* row = (byte*)bmd.Scan0 + (y * bmd.Stride);for (int x = 0; x < bmd.Width; x++)

{

Data[3 * (x + width * y)] = row[x * PIXELSIZE];
Data[3 * (x + width * y) + 1] = row[x * PIXELSIZE + 1];
Data[3 * (x + width * y) + 2] = row[x * PIXELSIZE + 2];

}

}

}

//Unlock bits
bmp.UnlockBits(bmd);

}

 

5.2 Kernel execution

 

The kernel execution structure simply copies data to OpenCL memory and reads it into the C# image byte array structure.

I have included code to run a kernel that implements the algorithm using work_dim = 1. You may take a look if you want. It is slower, though.

Full implementation is provided in the source code (CLFilter class). The host code to call the kernel is posted below:

 

/// <summary>Applies given filter to the image</summary>
/// <param name="imgDt">Image to be filtered</param>
/// <param name="Filter">Filter. [3*size*size]</param>
public static void ApplyFilter(ImageData imgDt, float[] Filter, bool useOpenCL, bool useWorkDim2)

{

int FilterSize = (int)Math.Sqrt(Filter.Length/3);
if (Filter.Length != 3 * FilterSize * FilterSize)
   throw new Exception("Invalid filter");

if (!Initialized && useOpenCL) Init(FilterSize);

//Writes filter to device
if(useOpenCL) varFilter.WriteToDevice(Filter);if (FilteredVals == null || FilteredVals.Length != imgDt.Height * imgDt.Width * 3)

{

//Filtered values
FilteredVals = new float[imgDt.Height * imgDt.Width * 3];
varFiltered =
new CLCalc.Program.Variable(FilteredVals);

}

//Width
if (useOpenCL) varWidth.WriteToDevice(new int[] { imgDt.Width });

 

//Executes filtering
int mean = (FilterSize - 1) / 2;

if (useOpenCL)

{

CLCalc.Program.Variable[] args = new CLCalc.Program.Variable[] { imgDt.varData, varFilter, varFiltered, varWidth };
if (useWorkDim2)
    kernelApplyFilterWorkDim2.Execute(args,
new int[] { imgDt.Width - FilterSize, imgDt.Height - FilterSize });

else
kernelApplyFilter.Execute(args, new int[] { imgDt.Height - FilterSize });

//Reads data back
varFiltered.ReadFromDeviceTo(FilteredVals);

}

else

{

ApplyFilter(imgDt.Data, Filter, FilteredVals, new int[] { imgDt.Width }, imgDt.Height - FilterSize);

}

//Writes to image data

for (int y = mean; y < imgDt.Height - mean - 1; y++)

{

int wy = imgDt.Width * y;for (int x = mean; x < imgDt.Width - mean - 1; x++)

{

int ind = 3 * (x + wy);
imgDt.Data[ind] = (
byte)FilteredVals[ind];
imgDt.Data[ind + 1] = (
byte)FilteredVals[ind + 1];
imgDt.Data[ind + 2] = (
byte)FilteredVals[ind + 2];

 

}

 

}

 

//Writes filtered values
//In the future this rewriting can be avoided
//because byte_addressable will be widely available

if (useOpenCL) imgDt.varData.WriteToDevice(imgDt.Data);

}

 

6. Conclusion

 

We have presented a simple yet fast way to compute image filters using only OpenCL basic implementation, which makes our code compatible with all cards that are compatible with OpenCL. Even not using images or returning data using only bytes we still manage to get a 60x faster algorithm using OpenCL which, in turn, makes it feasible to process real-time data from a webcam (13 FPS in my system).

Further optimization without losing compatibility would involve using filters smaller than 7x7, hard-coding the filter values and taking better advantage of symmetries of the filter.

 

Get the source code for this section.

Get source code of runtime execution image filter



Image filtering PDF Print E-mail
Written by Douglas Andrade   
Saturday, 26 June 2010 20:52
Please log in to leave a comment.
 


 
 
Copyright © 2014 CMSoft. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.
Design by handy online shop & windows 7 forum