Next Previous

OpenCL Memory Objects

Memory objects are reserved regions of global device memory that can serve as containers for your data. There are two types of memory objects: buffer objects and image objects. Whereas buffer objects are for containing any type of generic data, image objects are specifically for representing 2D or 3D images. This chapter discusses both types of memory object.

Representing Data with Buffer Objects

The OpenCL programming interface provides buffer objects for representing generic data in your OpenCL programs. Instead of having to convert your data to the domain of a specific type of hardware, OpenCL enables you to transfer your data as is to an OpenCL device via buffer objects and operate on the data using the same language features that you are accustomed to in C.

There are two principal ways for a kernel to access your host application data: It can follow a host pointer to the data, in which case traveling along a PCI bus might be necessary, or you can copy all of the host data to device memory first and then the kernel can access it locally. Because transmitting data is costly, it is best to minimize reads and writes as much as possible. By packaging all of your host data into a buffer object that can remain on the device, you reduce the amount of data traffic necessary to process your data.

Allocating Buffer Objects

Before you can store your application data in a buffer object, you must first use the clCreateBuffer function to create the buffer object, as shown in Listing 4-1.

Listing 4-1 Allocating buffer objects

    memobjs[0] = clCreateBuffer(context,

                              CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,

                              sizeof(cl_float4) * n, srcA, NULL);

    if (memobjs[0] == (cl_mem)0)

        clReleaseCommandQueue(cmd_queue);

        clReleaseContext(context);

        return -1;

    memobjs[1] = clCreateBuffer(context,

                              CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,

                              sizeof(cl_float4) * n, srcB, NULL);

    if (memobjs[1] == (cl_mem)0)

        delete_memobjs(memobjs, 1);

        clReleaseCommandQueue(cmd_queue);

        clReleaseContext(context);

        return -1;

   memobjs[2] = clCreateBuffer(context,

                                CL_MEM_READ_WRITE,

                                sizeof(cl_float) * n, NULL, NULL);

    if (memobjs[2] == (cl_mem)0)

        delete_memobjs(memobjs, 2);

        clReleaseCommandQueue(cmd_queue);

        clReleaseContext(context);

        return -1;

Note that this example does not show error handling. You should request an error code and check it. It is not sufficient to check whether the memory object returned is NULL.

In this example, the first read buffer is allocated with the CL_MEM_USE_HOST_PTR flag set. In contrast, the second read buffer is allocated with CL_MEM_COPY_HOST_PTR flag set. In both cases, you must also provide a pointer to your data. When the CL_MEM_USE_HOST_PTR flag is set, the OpenCL implementation has the option of caching the data on the OpenCL device, but it keeps the buffers on the two devices synchronized; when that flag is not set, it always allocates the memory on the host device. When the CL_MEM_COPY_HOST_PTR flag is set, on the other hand, the OpenCL implementation allocates the buffer on the device. In either case, it is initialized from the data in host memory pointed to by the fourth parameter. If you set the CL_MEM_USE_HOST_PTR flag, you can force OpenCL to allocate the data on the host device by also specifying the CL_MEM_ALLOC_HOST_PTR option. You can use these options to initialize the memory buffer, to synchronize memory buffers, and to make data accessible to multiple applications. However, keep in mind that transferring data between devices is costly.

It is perfectly acceptable to create a buffer object without specifying a corresponding pointer to data on the host device. By providing the clCreateBuffer function with NULL values for the options and for the host pointer, you create a buffer object that is independent of any pointers on the host. If there is specific host data that you’d like to place in that buffer object, you can do so by enqueuing a command to write to the buffer object using the clEnqueueWriteBuffer function, as discussed in the following section, “Reading, Writing, and Copying Buffer Objects.”

Reading, Writing, and Copying Buffer Objects

After you’ve created the buffer object, you can enqueue reads, writes, and copies. From your host application, you can use the following functions:

clEnqueueReadBuffer
This function enqueues a command to read data from a buffer into host memory. This is useful for reading the output results of a kernel back to the host application.
clEnqueueWriteBuffer
This function enqueues a command to write data from host memory to a buffer. You can use this function to provide data for processing by a kernel executing on the device.
clEnqueueCopyBuffer
This function copies data from one buffer object to another.

Important: The read, write, and copy commands clEnqueueRead*, clEnqueueWrite*, and clEnqueueCopy* functions only enqueue the memory commands; they don’t block by default. To know when the command has completed so that’s you can be sure the data is available and that it’s safe to free the memory, you either need to get an event and check the command's status from that event, or use a blocking form of the command by setting the blocking_* parameter in the function call to CL_TRUE.

These functions enable you to move data to and from a host. To actually process this data on a device, you have to make this data available to the work-items that execute on the device. The following sections show you how to pass your data to the compute kernels for further processing.

Accessing the Buffer Objects from a Kernel

After your data has been successfully transferred, to access it in a kernel you must then explicitly tell OpenCL to pass the specified buffer object as an argument to the specific kernel function that you defined in your OpenCL program source code. These functions are identified with the __kernel qualifier. You can do this by using the clSetKernelArg function as shown in Listing 5-1.

Once you’ve associated the appropriate buffer objects with the appropriate kernel arguments, the next time you execute that kernel (using a function such as clEnqueueNDRangeKernel), the kernel function receives the buffer objects supplied to the clSetKernelArg function as input.

For example, imagine that you have written a kernel function in OpenCL called “square” that takes an input value, multiplies it by itself, and then stores the resultant value as output. The source code for such a kernel could read as follows:

Listing 4-2 A squaring kernel function

__kernel square(

    __global float* input,

    __global float* output,

    const unsigned int count)

    int i = get_global_id(0);

    if(i < count)

        output[i] = input[i] * input[i];

In your host application source code, it’s your responsibility to :

Prepare the input data.
Create the buffer object. Use the clCreateBuffer function to create a buffer object of the appropriate size.
Move the input data from host memory. You can do this in the clCreateBuffer function by pointing to the data on the host, or you can use the clEnqueueWriteBuffer function to enqueue a write from host memory.
Associate the input data with the kernel’s arguments. Use the clSetKernelArg function to do this.

Listing 5-1 shows an example of how to do this in the host application.

Processing Data in OpenCL

By associating your buffer object with specific kernel arguments, you make it possible to process your data from the context of a kernel function. For example, in Listing 4-2, notice how the code sample treats the input data pointer much as you would treat a pointer in C. In this example the input data is an array of float values, and you can process each element of the float array by indexing into the pointer. Listing 4-2 does little more than multiply a value by itself using the * operator, but OpenCL-C provides a wide array of data types and operators that enable you to perform more complex arithmetic.

Because OpenCL-C is based on C99, you are free to process your data in OpenCL functions as you would in C with few limitations. Aside from support for recursion and function pointers, there are not many language features that C has that OpenCL doesn’t have. In fact, OpenCL provides several beneficial features that the C programming language does not offer natively, such as optimized image access functions.

OpenCL has built-in support for vector intrinsics and offers vector data types. The operators in OpenCL are overloaded, and performing arithmetic between vector data types is syntactically equivalent to performing arithmetic between scalar values. Refer to the The OpenCL Specification for more details on the built-in functions and facilities of the OpenCL-C language.

When you are done processing your data and writing these results to an output buffer, your host application can read this data back into host memory using the clEnqueueReadBuffer function or the clEnqueueReadImage function, depending on what type of memory object you created to store the output results.

Retaining and Releasing Buffer Objects

Buffer objects should be freed when no longer needed to avoid memory leaks. OpenCL uses a reference counting system to keep track of the memory objects currently being used. The reference count represents how many other objects hold references to the particular memory object. Any time you create a buffer object, it immediately receives a reference count of 1. Any time another object would also like to maintain a reference to it, it should increment the buffer object’s reference count by calling the clRetainMemObject function. When an object wishes to relinquish its reference to a buffer object, it should call clReleaseMemObject. When the reference count for a buffer object reaches zero, OpenCL frees it, returning the memory to the system and making any persisting references to the buffer object invalid.

Image Objects

OpenCL has built-in support for processing image data. Using image objects, you can take image data that resides in host memory and make it available for processing in a kernel executing on an OpenCL device. Image objects simplify the process of representing and accessing image data since they offer native support for a multitude of image formats. If you are writing kernel functions that need to efficiently perform calculations on image data, you will find OpenCL native support for images useful.

The following sections show you how to take your image data that resides in host memory and place it in image objects that you can later access within a kernel. It also provides an overview of how to go about processing this image data.

Representing Two-Dimensional Images

In order to be able to process an image in a kernel, you need to create an image object. Creating an image object allocates memory specifically tailored to holding image data. For example, your host application can use the clCreateImage2D function (shown in Listing 4-3, taken from the OpenCL Procedural Grass and Terrain Example sample code project) to create a two-dimensional image object.

Listing 4-3 Creating a 2D image object

ComputeEngine::createImage2D(

    const char* acMemObjName,

    MemFlags eMemFlags,

    ChannelOrder eOrder,

    ChannelType eType,

    uint uiWidth,

    uint uiHeight,

    uint uiRowPitch,

    void* pvData)

    uint uiChannelCount = getChannelCount(eOrder);

    if(uiChannelCount == 0)

        return false;

 // set the image format properties and option flags

    cl_image_format kFormat                        ;

    kFormat.image_channel_order = (cl_channel_order) eOrder;

    kFormat.image_channel_data_type = (cl_channel_type) eType;

    cl_mem_flags kFlags = (cl_mem_flags) eMemFlags;

    int iError = CL_SUCCESS;

    cl_mem kImage = clCreateImage2D(

          m_kContext,         // a valid OpenCL context

          kFlags,             // option flags                  [1]

          &kFormat,           // image format properties       [2]

          (size_t)uiWidth,    // width of the image in pixels

          (size_t)uiHeight,   // height of the image in pixels

          (size_t)uiRowPitch, // scan-line pitch in bytes      [3]

          pvData,             // pointer to the image data

          &iError             // on return, the result code);

    if(kImage == 0 || iError != CL_SUCCESS)

    { ... }

    m_akMemObjects[acMemObjName] = kImage;

    return true;

Notes:

The same options as you can use for buffer objects, such as read-only, write-only, or allocate memory on the host. See The OpenCL Specification for details.
These properties include number of channels, channel order, and channel data type. Examples of values that set the number of channels and channel order are CL_RG, CL_RGBA, and CL_BGRA, where R is red, G is green, B is blue, and A is alpha. See The OpenCL Specification for the complete list of possible property values.
The scan-line pitch, also referred to as row pitch, represents how many bytes are necessary to represent one row (or scan line) of the image. You need to specify the row pitch only if the data from which you are copying the image has a particular row pitch. Internally, OpenCL uses a storage format optimized for the device. Specify 0 if you want OpenCL to calculate the row-pitch value for you.

Calculating Row Pitch

To calculate row pitch, take the width of the image in pixels and multiply it by the number of bytes in each pixel. For example, an image that is in CL_RGBA format has four separate image channels per pixel: red, green, blue, and the alpha channel. The pseudocode in Listing 4-4 shows how you can calculate the row pitch of an image. If the pointer to a preexisting image buffer is not NULL and you specify 0 for the row pitch parameter, then OpenCL calculates the row pitch as the image width * the size of a pixel element in bytes.

Listing 4-4 Calculating the row pitch for an image in 8-bit RGBA format

// assume that each channel is represented with in CL_RGBA / CL_UNORM_INT8 format

num_channels_per_pixel = 4;

image_width = ... ; // the width of the image

channel_size = sizeof(uint8);

pixel_size = channel_size * num_channels_per_pixel;

image_row_pitch = image_width * pixel_size;

Representing Three-Dimensional Images

To create an image object that represents three-dimensional data, you must specify the image depth along with the height and width as you do for two-dimensional images. However instead of providing only the row pitch as when creating a two-dimensional image object, you must also provide OpenCL with the slice pitch of the three-dimensional image. The slice pitch represents the size, in bytes, of each two-dimensional slice of the image. You can compute this by taking the row pitch of the image and multiplying this by the height of the image. With the image depth and the slice pitch you provide OpenCL with information on the general geometry of the three-dimensional space your image occupies. You can think of it as similar to defining a bounding cube around your three-dimensional image.

Aside from requiring the image depth and slice pitch, calling the clCreateImage3D function is essentially identical to creating a two-dimensional image object .

Reading, Writing, and Copying Image Objects

After you’ve created the image object, you can enqueue reads, writes, and copies to and from host memory. From your host application, you can use the following functions:

clEnqueueReadImage
This function enqueues a command to read data from an image object into host memory. This is useful for reading the output results of a kernel back to the host application.
clEnqueueWriteImage
This function enqueues a command to write data from host memory. You can use this function to provide image data for processing by a kernel executing on the device.
clEnqueueCopyImage
This function copies data from one image object to another.

These functions enable you to move images to and from host memory. To actually process this image data on a device, you have to make this data available to the work-items that execute on the device. The following sections show you how to pass your data to the kernels for further processing.

Accessing the Image Objects from a Kernel

After your data has been successfully transferred as an image object, to access it in a kernel you must then explicitly tell OpenCL to pass the specified image object as an argument to the specific kernel function that you defined in your OpenCL program source code. These functions are identified with the __kernel qualifier. You can do this by using the clSetKernelArg function as shown in Listing 5-1.

Once you’ve associated the appropriate image objects with the appropriate kernel arguments, the next time you execute that kernel (using a function such as clEnqueueNDRangeKernel), the kernel function receives the image objects supplied to the clSetKernelArg function as input.

For example, imagine that you have written a kernel function in OpenCL called “foo” that takes two arguments: an input two-dimensional image, and an output two-dimensional image. The signature for such a kernel could read as follows in Listing 4-5.

Listing 4-5 A image-processing kernel function

__kernel void foo (

    read_only image2d_t imageA,

    write_only image2d_t imageB)

...

Notice the read_only and write_only qualifiers that precede the imageA and imageB arguments, respectively. These are examples of image access qualifiers that you can declare in your kernels to enforce read-only or write-only access to a certain image. The default qualifier is read_only. You can find more details on this and other argument qualifiers in The OpenCL Specification.

Note: Whereas a buffer object can be used for both reads and writes in a given kernel, an image object can be used only for read or writes, not both.

Processing Images in OpenCL

By associating your image object with specific kernel arguments, you make it possible to process your image data from the context of a kernel function. When you use image objects as the arguments to a kernel, they assume the image2d_t or image3d_t data type. Instead of accessing the buffer of image data directly, you must use the built-in OpenCL read and write image functions such as read_imagef and write_imagef. The read_image* functions return a four component floating-point, integer, or unsigned integer color value. OpenCL identifies the color channels as x, y, z, and w, where the x component refers to the red channel, the y component refers to the green channel, the z component refers to the blue channel, and the w component refers to the alpha channel. You can find more details on the use of these image functions in the The OpenCL Specification.

When you are done processing your image data and writing these results to the output image, the host application can read the image back into host memory using functions such as clEnqueueReadImage.

Retaining and Releasing Image Objects

Image objects should be freed when no longer needed to avoid memory leaks. Image objects are retained and released in the same manner as other memory objects; see “Retaining and Releasing Buffer Objects” for details.

Next Previous

Last updated: 2009-06-10

Did this document help you?

Shop the Apple Online Store (1-800-MY-APPLE), visit an Apple Retail Store, or find a reseller.