# Case study: color tracking

## 1. Introduction

Tracking a set of colors in a video is a first approximation and initial guess for many applications. In fact, determining what parts of an image belong to skin, for example, is very important to track faces or hands. This Case Study presents a technique that is robust to motion-blur and that can perform real-time tracking thanks to OpenCL acceleration. Source code is provided showing how to implement a flashlight mouse, i.e., how to use the webcam and a flashlight to perform mouse movement and clicking.

For this case study, it is important that the reader is familiar with the RGB structure of pictures and video frames. If this is not the case, the entry  is a good starting point. Familiarity with statistics is also important and reference  is a good place to start. We consider this Case Study as easy for programmers familiar with the RGBA structure and with weighted means and variances (and standard deviation). Familiarity with OpenCL image2d_t is also important and can be found in CMSoft Image2D tutorial.

We’ll see how to implement a color-tracking algorithm accelerated using OpenCL , which will enable the application to run real-time (30 fps in my computer) even if the scan region is the entire screen. Then, we’ll use the concept to create a program that sees a flashlight and allows the user to move the mouse by moving the light and click by blinking it using an Open Source library , as shown in the video below:

You probably want to try it yourself in order to see the actual position of the cursor since all I could do in this video was to show where the mouse click was.

## 2. Mathematical foundations for tracking

Colors are usually represented in a computer by means of their RGB coordinates. More specifically, a very common storage method is the RGBA pattern or the BGRA (C#) pattern, in which each pixel’s color is given by 4 bytes, namely the R (red) component, G (green) component, B (blue) component and the Alpha component discussed later in this same topic. So, if we want to establish a method to track colors, we’d better establish a way to distinguish those colors, or a color distance. We also would like to determine a box inside which the given color lies. In short, what we’d like to do is:

1. Determine which color should be tracked;
2. Given a candidate region, compute the center of the color (i.e., its MEAN position);
3. Compute a box in which most of the points lie (i.e., the STANDARD DEVIATION and a CONFIDENCE INTERVAL);
4. Optionally, adapt to small color changes;
5. Use the computed box as an estimate to where to look for in the next iteration;
6. Optionally, increase the box size if the color is lost. This is made possible because of OpenCL acceleration; it would be too slow otherwise.

For you readers familiar with the theory, what we’ll use here is a simple Kalman filter. If you aren’t familiar with the concept, it should suffice to know that we’re using data computed using previous frames to predict what’s coming in the next frames. You can find lots of useful information about the topic in reference .

In C# and OpenCL, the fourth component, Alpha, defines the transparency of the pixel. We’re not going to use this information in our calculations but it is important to know that because we need to transfer and set the alpha values in the algorithm.

Moreover, C# color structure is BGRA, which will be important when defining the color to be tracked. In the flashlight example the color being tracked is White so this won’t pose any problem, but when using the code to track red, for example, you will need to call a [0,0,255] because we’re sending a BGRA structure to OpenCL.

### 2.1 Color distance

The first thing to consider is: which color is going to be tracked? This is not obvious and it is very application-specific. For example, you may want to create some industrial identification algorithm to recognize patterns that come in blue, red and green. The flashlight example will assume that the color is White [255,255,255], and we’ll assume the color to be tracked has been defined.

In general, the question we’re trying to answer is: given a color C1 = [R1,G1,B1], what difference will the human eye perceive from another color C2 = [R2, G2, B2]?

This is not a simple question and many research has been done in the area (take  as an example). It is clear that an option is to compute an Euclidean color distance:

d² = (R1-R2)² + (G1-G2)² + (B1-B2)²

But I personally don’t think this is particularly interesting. Let’s compute the difference between black [0,0,0] and white [255,255,255]: we’ll get d² = 255²+255²+255², i.e., d = 441.7. Let’s now take green [0,255,0] and white: we get 255²+0²+255², i.e., d = 360.6. Green and black: 0²+255²+0², i.e., d = 255. This is a strange result because it tells that green looks more like black than it does to white, and that white to black are the most different colors there are.

It seems more logical to consider that the distances between white, black, red, blue and green are the same which in turn calls for the following metric:

d = max(|R1-R2|, |G1-G2|, |B1-B2|)

We’ll stick to this metric in this Case Study but the reader should feel free to modify the code to implement another metric.

### 2.2 Computing object center

Now that the color is known we face the problem of computing it’s center given a region of interest. One could initialize the region of interest as being the whole screen or by defining a starting point. Tracking from there on will take into consideration a prediction of where the object will be.

Essentially, what we want to do in this step is to compute the X and Y averages of the color, i.e.:

• Read all pixels whose color is the one being tracked;
• Sum their X and Y coordinates;
• Divide by the total number of pixels whose color is being tracked.

There’s an issue though, the color may vary. What I mean is, colors [135,155,144] and [139,160,140] will look very grayish and similar.

It’s very clear that the difference from the color being tracked to the color of each pixel in the region of interest plays a fundamental role in this context. One option here would be to compute the mean of points whose difference to the desired color is less than a given value, say 15. This would be appropriate for a CPU computation because it would save time by reducing the amount of data to be used for computing standard deviation later on, but since we have OpenCL we can do a more ellaborate trick.

A problem with simply computing the distance and throwing away points that are too different from the desired point is motion blur, as seen in the picture below (green object): A pass-fail criterion may find too few points or simply not find any point at all because the original object’s color is blurred within the scene. A more robust way of doing this is computing a weighted average of the points in the box using the similarity of each pixel to the color tracked as a weight. In this Case Study the weight of a given pixel is given by:

w = exp(-(k*Distance(ColorTracked, ColorPixel)²))

where k is a color tolerance used to define how much error the algorithm will accept and still consider the given pixel as “looking like” the tracked color. For more details about the implementation, download the source code for this section and study the OpenCL kernels in CLColorTrack class. Take a look at reference  for more details about weighted mean.

### 2.3 Computing object box

This section is pretty simple given that the reader has understood the previous step. We’re going to compute the standard deviation of the above sample.  Again, reference  can provide some information about how to correct for weighted variance and the reader is encouraged to study CLColorTrack class for details about implementation.

### 2.4 Adapting to color variation

If luminosity changes are a problem to your application, it is possible to correct for this change by considering the average computed as an estimate of the new color. This is application-specific and beyond the scope of this Case Study, where we want to use a flashlight as remote-controlled mouse.

## 3. Flashlight Mouse Implementation

We now have the tools to develop a program that allows us to use a flashlight as a mouse. We’ll assume that the flashlight is a white region on the screen. Have in mind that this assumption requires that the webcam doesn’t see any other sources of light, or else the program won’t work. If there are no sources of light other than the flashlight the software will work really well. So what we should do is:

1. Find where the flashlight is. It’s the white region;
2. Set the point where the flashlight is found to be the origin;
3. Start to track the flashlight and transform its movement into mouse movement;
4. Observe when the light blinks and interpret this as a command;
5. If the flashlight can’t be found, go back to step 1.

### 3.1 Acquiring webcam image

In order to acquire the webcam image we’ll use Aforge framework :

```VideoCaptureDevice vcd;
private void StartCam()
{
FilterInfoCollection devs = newFilterInfoCollection(FilterCategory.VideoInputDevice);
if (devs.Count > 0)
{
vcd = new VideoCaptureDevice(devs.MonikerString);
vcd.NewFrame += str_NewFrame;
vcd.Start();
}
else MessageBox.Show("Unable to find webcam", this.Text, MessageBoxButtons.OK,MessageBoxIcon.Error);
}
```

Notice that we have associated method str_NewFrame to the Video Capture Device NewFrame event and this method will be called every time a new frame is generated.

### 3.2 Processing the image

A variable is used to transfer the tracking data to and from the OpenCL device (C# declaration below). Notice that CLdadosTrack is, in fact, a 7xHeight matrix, where Height is the number of lines in the tracked region (in this implementation we used total number of lines in order not to need to recreate the object for different tracking regions).

```/// <summary>Tracking data:  - Weighted total pixels,  meanColor,  - xMean,  - yMean,  - xVariance,  - yVariance for each line</summary>
```

For the sake of convenience, the OpenCL code used to compute the average of the pixels is posted below:

```int ColorDiff(uint4 cor1, uint4 cor2)
{
int r = (int)cor1.x - (int)cor2.x;
if (r < 0) r = -r;
int g = (int)cor1.y - (int)cor2.y;
if (g < 0) g = -g;
int b = (int)cor1.z - (int)cor2.z;
if (b < 0) b = -b;
if (r < g) r = g;
if (r < b) r = b;
return r;
}
__global int * box,
__global int * corRef,
__global float * COLORTOL)
{
const sampler_t smp = CLK_NORMALIZED_COORDS_FALSE | //Natural coordinates
CLK_FILTER_NEAREST; //Don't interpolate

int y = get_global_id(0) + box;
int x0 = box, xf = box;
uint4 corObj = (uint4)((uint)corRef,(uint)corRef,(uint)corRef,0);

int2 coord = (int2)(0, y);
uint4 curColor;

float temp;
float qtdPix = 0;   float mean;
float meanColor;

mean = 0; mean = 0; meanColor = 0; meanColor = 0; meanColor = 0;
float invColorTol = 1.0f/COLORTOL;   for (int xx = x0; xx <= xf; xx++)
{
coord.x = xx;
temp = (float)ColorDiff(curColor, corObj)*invColorTol;
temp *= temp;
temp = native_exp(-temp);

qtdPix += temp;
mean = mad((float)y , temp, mean);

}
int yy = 8*y;
}
```

As you can see, it computes the average PER LINE and, afterwards, the host code has to sum these averages and divide them by the sum of the weights. We would recommend studying the complete CLColorTrack.TrackColor method in order to get full understanding of the code.

In order to compute flashlight blinks, it is a good idea to use System.Diagnostics.StopWatch, which is really a C# and not an OpenCL structure. Further information can be found at Microsoft MSDN . The StopWatch is essentially a timer which we use in this example to compute how long the flashlight stays off and how long it stays on. If the flashlight stays on or off for a set threshold time, the command is sent to a callback function. The command corresponds to the “Last input command” image on the screen.

### 3.4 Emulating the mouse

I’ve used a very interesting mouse/keyboard lib, written by CodeSummoner . It allows us to use the Windows API to simulate mouse movement and clicks.

Click, double-click and right-click commands can be issued by blinking the flashlight (with the finger, for example), once, twice and three times, respectively, as it can be seen in the video.

• For convenience, code has been implemented to minimize the window to the tray instead of the status bar;
• The sensitivity of the flashlight movements can be adjusted. Ideally, you want more sensitivity if you are far away from the screen;
• Covering the flashlight for 3 seconds makes the program reset its position (which can be useful if you happen to get the flashlight out of the webcam’s range);
• This could be used to process things like trajectory commands or Morse Code parsing.

## 4. Conclusion

This Case Study shows how to implement a robust real-time color tracking that can handle motion blur. This can be achieved by computing weighted averages of pixels using an exponential function as weight and accelerating the computations using OpenCL.

An implementation is shown depicting how to use a webcam and a flashlight to create a “remote control” capable of moving the mouse and clicking in a fully functional and open-source code.

## 5. References

 OpenCL 1.1 Specification, Khronos Group.
 Global mouse and keyboard library, http://www.codeproject.com/KB/system/globalmousekeyboardlib.aspx, sep-2010.
 RGB Color Model, http://en.wikipedia.org/wiki/RGB, sep-2010.
 WELCH, Greg, BISHOP, Gary. An Introduction to the Kalman Filter. University of North Carolina. Found at http://www4.cs.umanitoba.ca/~jacky/Teaching/Courses/74.795-LocalVision/ReadingList/SIGGRAPH2001_CoursePack_08.pdf, sep-2010.
 Colour metric, http://www.compuphase.com/cmetric.htm, sep-2010.
 Weighted mean, http://en.wikipedia.org/wiki/Weighted_mean, sep-2010.
1. Guruprasad says: