Header Ads

Boost the performance of your Android app with OpenCL

Although our Xperia™ devices have become much more powerful the last couple of years, there might still be cases when your application needs more processing power. With OpenCL, you can use the power of the GPU to handle resource intensive tasks in your app. This article is a short introduction to OpenCL, and how to get it up and running on your Sony Xperia device.
Jim Rasmusson
Jim Rasmusson, Master Researcher at Sony Mobile.
In this article, Jim Rasmusson, Master Researcher, will take you through the steps you need to take to get started using OpenCL in your app. A code example is also provided, see OpenCL code example page. But first, we’ll explain some basics about OpenCL.
What is OpenCL?
The graphics processors (also called GPUs) inside the Xperia devices have taken a big leap forward this year with a new hardware architecture that enables so called General Purpose Graphics Processing Unit (GPGPU) capabilities. In essence, this means that the GPU can be used as a powerful array processor for general purpose processing. In our current top models, the Xperia Z1 and Xperia Z Ultra, the GPU actually has 128 small processing cores in total.
In order to program these array processors efficiently, a new programming framework is needed. One option is to use OpenCL, which has been around a couple of years for desktop computers. OpenCL has been created byKhronos as an open standard for high performance computations. It is now quite mature, has good industry support and an interesting roadmap.
OpenCL can be particularly useful when running data parallel processing tasks. Among the most efficient workloads are various types of multimedia processing tasks like image, camera and computer vision type processing. But you may also find benefits with other types of data parallel tasks like encryption, or when simulating physics or particle systems.
OpenCL is available in several of our Sony Xperia devices, including the Xperia Z, Xperia ZL, Xperia Tablet Z and Xperia ZR (in addition to Xperia Z1 and Xperia Z Ultra mentioned above).
The OpenCL platform model
OpenCL Platform Model
The OpenCL platform model: You have one Host and one or more Compute Devices. A Compute Device has one or more Compute Units, which have one or more Processing Elements. (Image supplied courtesy of Khronos Group)
We currently support OpenCL version 1.1, and our OpenCL implementation adheres to the OpenCL Platform Model (see the illustration above). We have a Host, running on the CPU, and a Compute Device, that, in our implementation, is the GPU. For example, in Xperia Z1 and Xperia Z Ultra, the GPU is the Adreno 330, which has 4 Compute Units, each with 32 Processing Elements, giving us 128 processing elements in total.
OpenCL consists of different parts: the platform and runtime APIs, and the actual kernel programming language itself, OpenCL-C, which is based on C99. If your favourite C code is not overly complicated, it is often fairly straightforward to port it to OpenCL-C. You use the platform and runtime APIs to query, setup and start execution on the OpenCL compatible subsystems available in your system (for example, the GPU, and maybe even the CPU).
The OpenCL code exampleTo be able to use this example, you should be familiar with C, C++, and Java programming in an Android environment.
In our small image noise reduction code example, we will use a bilateral filter to filter an image. This is quite math heavy. As we will see, the OpenCL version of this code executes around 60 times faster (on some of our Xperia devices) in OpenCL compared to the (single threaded) reference C code!
Since OpenCL is not natively supported in Android, we have to use the Android NDK. If you take a look in the code example project, you will see the typical files necessary for an NDK project, such as the Android.mk file, and the jni folder with your C and C++ files. In the asset folder, there is also a bilateralKernel.cl file, a text file that contains our OpenCL-C kernel code.
If we take a look at the OpenCL–C kernel code below, we see that it is quite “C like”, but with some additional data types and a few extra commands. In particular, note the vector data types (for example float4), and their associated nice vector component selection syntax. In the example below we use currentPixel.xyz to select the first three channels (rgb) of this particular float4 variable.
kernel void bilateralFilterKernel(__global uchar4 *srcBuffer,
__global uchar4 *dstBuffer,
 const int width, const int height)
{
 int x = get_global_id(0);
 int y = get_global_id(1);
 int centerIndex = y * width + x;
 float4 sum4 = (float4)0.0f;

 if ( (x >= filterWidth) && (x < (width - filterWidth)) && //avoid reading outside of buffer
 (y >= filterWidth) && (y < (height - filterWidth)) )
 {
 float4 centerPixel = oneover255 * convert_float4(srcBuffer[centerIndex]);
 float normalizeCoeff = 0.0f;

 for (int yy=-filterWidth; yy<=filterWidth; ++yy)
 {
 for (int xx=-filterWidth; xx<=filterWidth; ++xx)
 {
 int thisIndex = (y + yy) * width + (x + xx);
 float4 currentPixel = oneover255 * convert_float4(srcBuffer[thisIndex]);
 float domainDistance = fast_distance((float)(xx), (float)(yy));
 float domainWeight = exp(-0.5f * pow((domainDistance/sigmaDomain),2.0f));

 float rangeDistance = fast_distance(currentPixel.xyz, centerPixel.xyz);
 float rangeWeight = exp(-0.5f * pow((rangeDistance/sigmaRange),2.0f));

 float totalWeight = domainWeight * rangeWeight ;
 normalizeCoeff += totalWeight;

 sum4 += totalWeight * currentPixel;
 }
 }
 sum4 /= normalizeCoeff;
 }
 dstBuffer[centerIndex] = convert_uchar4_sat_rte(255.0f * sum4);
}
One example function built-into OpenCL is distance. It calculates the Euclidean distance between two variables and is handy for image processing, for example. So
float rangeDistance = distance(currentPixel.xyz, centerPixel.xyz);
corresponds to the C code
float rangeDistance = 0.0f;
for (int c=0; c<3 -="" c="" centerpixel="" currentpixel="" pre="" rangedistance="">
In our OpenCL kernel code example, we have used the faster version of distance“fast_distance” , which brings even faster execution speeds. To get the typical bilateral filter effect (and associated image noise reduction), it is necessary to iterate this filter a few times over the image. This why we call the same bilateral filter routine three times in this example.
NativeC bilateral filter
Left: noisy original image. Right: after bilateral filter with NativeC
OpenCL bilateral filter
Left: noisy original image. Right: after bilateral filter with OpenCL. Note that the processing time is now 51 ms compared to 3022 ms in the NativeC example.
If you have an OpenCL compatible Android device, you can run the supplied openclexample.apk file to check how it works. You can also see that the execution speed is much faster using OpenCL on the GPU when compared to the plain single threaded c-code running on the CPU (tested on Sony Xperia Z1). In addition to the speed benefit, you may also find that you decrease energy consumption by utilizing OpenCL on the GPU compared to using standard programming methods on the CPU.
OpenCL actively demonstrates that the GPU units in recent mobile devices are more than capable of handling non-graphical computing across a variety of apps. Keep OpenCL in mind when considering how to enhance flexibility, functionality, and performance of your apps, and let us know in the comments below if you have any further questions!

No comments:

Powered by Blogger.