OpenCL FFT

# OpenCL FFT

## 1. Introduction

The Disccrete Fourier Transform (DFT) is a Fourier Analysis function that computes the frequency domain representation of a given input. The DFT has an enormous amount of applications, including spectral analysis, data compression, multiplication of polynomials and computation of remaining fatigue life of materials.

CMSoft presents an easy-to-use powerful tool to compute the DFT using the Fast Fourier Transform algorithm using OpenCL, both single-precision and double-precision. Source code containing examples is available.

Check the video below for a demonstration of the capabilities of the tool:

We'd like to thank Bainville [1] for allowing us at CMSoft to use his code in OpenCLTemplate's FFT functions.

## 2. Computing the FFT

To use OpenCLTemplate's FFT functions, it is necessary is to create a float or double vector whose number of complex elements is a power of 4 or 16 (depending whether you want to use the radix-4 or radix-16 version of the algorithm).

In the array, the even-indexed elements correspond to real parts and the odd-indexed elements correspond to the imaginary part. For example, the command:

float[] xx = new float[] { 1, 0, 2, 0, 3, 0, 4, 0 };

creates an array of floats containing the real sequence 1, 2, 3, 4. To compute the answer, simply call:

float[] fftxx = CLFFTfloat.FFT4(xx);

The answer is fftxx = {10, 0, -2, 2, -2, 0, -2, -2}, which corresponds to 10, -2+2*i, -2, -2-2*i.

You may call the FFT using a buffer element which is already stored in the Device memory. In this case, use:

CLCalc.Program.Variable CLfft = CLFFTfloat.FFT16(ref CLft);

In the above call, CLfft now contains a Variable which stores the values originallyy stored in CLft, which is destroyed in the process.

Notice that the number of elements should be a power of 4 or 16. This is no big limitation as you can always pad the last elements with zeroes.

In our tests, the implemented FFT algorithm ran 210x faster than what was implemented in McLabEn (available in Softwares seection) and 37 x faster than an efficient commercial package using only CPU (which we can't disclose here).

## 3. Notes on Frequency Components

In the DFT of a real signal using N components, given the sampling time T and the sampling frequency Fs = 1/T, so that the function f(t) is sampled at f(0), f(T), f(2T), etc., with time length L = Final time - Initial time,  the frequency components of FFT(f) = fft are given by

Frequency corresponding to fft[i] = (0.5 * Fs / L) * (i / (N/2)), i=0..N/2+1 (the DFT is symmetrical)

The corresponding magnitude is computed taking into account the FFT scale normalization 2/N:

Magnitude[i] = sqrt(fft[2i]²+fft[2i+1]²)*2/N

Details of the implementation can be found in the source code  (private void btnViewDFT_Click(object sender, EventArgs e)).

## 4. Conclusion

We have presented an algorithm that can perform around 200x faster than poorly implemented FFTs and up to 37x faster than commercial packages which only utilize the CPU. The presented method is practical enough in terms of usage as it requires a single call to a function.

Source code and videos are made available to help understand the concepts and how to use the resource.

## 5. References

[1] OpenCL Fast Fourier Transform
Eric Bainville - May 2010
http://www.bealto.com/gpu-fft_opencl-1.html

[2] Discrete Fourier transform basic theory
http://en.wikipedia.org/wiki/Discrete_Fourier_transform