iOS Reference Library Apple Developer
Search

Performance Guidelines

The performance of OpenGL ES applications in iOS differs from that of OpenGL in Mac OS X or other desktop operating systems. Although powerful computing devices, iOS–based devices do not have the memory or CPU power that desktop or laptop computers possess. Embedded GPUs are optimized for lower memory and power usage, using algorithms different from those a typical desktop or laptop GPU might use. Rendering your graphics data inefficiently not only can result in a poor frame rate, but can also dramatically reduce the battery life of an iOS-based device.

Although other chapters have already touched on many performance bottlenecks, this chapter takes a more comprehensive look at how to optimize your applications for iOS.

General Performance Recommendations

Redraw Scenes Only When Necessary

An iOS-based device continues to display a frame until your application presents a new frame to display. If the data used to render your image has not changed, your application should not reprocess the image. Your application should wait until something in the scene has changed before rendering a new frame.

Even when your data changes, it is not necessary to render frames at the speed the hardware processes commands. A slower but fixed frame rate often appears smoother to the user than a fast but variable frame rate. A fixed frame rate of 30 frames per second is sufficient for most animation and helps reduce power consumption.

Use Floating-Point Arithmetic

3D applications (especially games) require physics, collision detection, lighting, animation, and other processing to create a compelling and interesting 3D world. All of these boil down to a collection of mathematical functions that need to be evaluated every frame. This imposes a serious amount of arithmetic on the CPU.

The ARM processor in iPhone and iPod touch processes floating-point instructions natively. Your application should use floating-point instead of fixed point math whenever possible. If you are porting an application from another platform that does not support native floating-point math, you should rewrite the code to use floating-point types.

Note: iPhone supports both ARM and Thumb instruction sets. Although Thumb reduces the code size, be sure to use ARM instructions for floating-point intensive code for better performance. To turn off the default Thumb setting in Xcode, open the project properties and deselect the Compile for Thumb build setting.

Disable Unused OpenGL ES Features

Whether you are using the fixed-function pipeline of OpenGL ES 1.1 or shaders in OpenGL ES 2.0, the best calculation is one that your application never performs.

If your application uses OpenGL ES 1.1, it should disable fixed-function operations that are not necessary to render the scene. For example, if your application does not require lighting or blending, it should disable those functions. Similarly, if your application is performing 2D drawing, it should disable fog and depth testing.

If your application is written for OpenGL ES 2.0, do not create a single shader that can be configured to perform every task your application needs to render the scene. Instead, compile multiple shader programs that perform specific, focused tasks. As with OpenGL ES 1.1, if your shader can remove unnecessary calculations, it provides faster overall performance.

This guideline must be balanced with other recommendations, such as “Avoid Changing OpenGL ES State Unnecessarily.”

Minimize the Number of Draw Calls

Every time your application submits geometry to be processed by OpenGL ES, the CPU spends time preparing the commands for the graphics hardware. To reduce this overhead, you should batch your geometry into fewer calls.

If your application draws geometry using triangle strips, you can reduce the number of submissions by merging two or more triangle strips into a single triangle strip. To do this, you add degenerate triangles formed either by two or three collinear points. For example, instead of using one call to draw a strip ABCD and another to draw a strip EFGH, you can add in the degenerate triangles CDD, DDE, DEE, and EEF to create a new strip ABCDDEEFGH. This strip can be drawn with a single submission.

In order to gather separate triangle strips into a single strip, all of the strips must share the same rendering requirements. This means:

Consolidating geometry to use a single set of OpenGL state has other advantages in that it reduces the overhead of changing your OpenGL ES state, as documented in “Avoid Reading or Writing OpenGL ES State.”

For best results, consolidate geometry that is in close spacial proximity. Large, sprawling geometry is more difficult for your application to efficiently cull when it is not visible in the scene.

Contexts

The way your application interacts with the rest of the iOS graphics system is critical to the performance of your application. These interactions are documented in detail in “Displaying Your Results.”

Memory

Memory is a scarce resource on iOS-based devices. Your iOS application shares main memory with the system and other iOS applications. Memory allocated for OpenGL ES reduces the amount of memory available for other uses in your application. With that in mind, you should allocate only objects that you need and deallocate them when your application no longer needs them. For example, any of these scenarios can save memory:

The virtual memory system in iOS does not use a swap file. When a low-memory condition is detected, instead of writing volatile pages to disk, the virtual memory frees up nonvolatile memory to give your running application the memory it needs. Your application should strive to use as little memory as possible and be prepared to release cached data that is not essential to your application. Responding to low-memory conditions is covered in detail in the iOS Application Programming Guide.

The PowerVR MBX processor found in some iPhone and iPod touch models has additional memory requirements. See “PowerVR MBX” for more information.

Avoid Reading or Writing OpenGL ES State

Every time you read or write OpenGL ES state, the CPU spends time processing the command before sending it to the hardware. Occasionally, accessing OpenGL ES state may force previous operations to be completed before the state can be accessed. For best performance, your application should try to touch OpenGL ES state as infrequently as possible.

Avoid Querying OpenGL ES State

Calls to glGet*() including glGetError() may require OpenGL ES to execute all previous commands before retrieving any state variables. This synchronization forces the graphics hardware to run lockstep with the CPU, reducing opportunities for parallelism.

Your application should keep shadow copies of any OpenGL ES state that you need to query, and maintain these shadow copies as you change the state.

Although it is critical to call glGetError in a debug build of your application, calling glGetError in the release version of your application degrades performance.

Avoid Changing OpenGL ES State Unnecessarily

Changing OpenGL ES state requires the hardware to be updated with new information, which may cause cause hardware to stall or force it to execute previously submitted commands. Your application can reduce the number of state changes it requires by following these guidelines:

Drawing Order

Lighting

Simplify lighting as much as possible. This advice applies both to fixed-function lighting in OpenGL ES 1.1 and shader-based lighting calculations you use in your custom shaders in OpenGL ES 2.0.

OpenGL ES 2.0 Shaders

Shaders present additional areas where you can improve your application’s performance.

Compile and Link Shaders During Initialization

Creating a shader program is an expensive operation compared to other OpenGL ES state changes. Listing 6-1 presents a typical strategy to load, compile, and verify a shader program.

Listing 6-1  Loading a Shader

/** Initialization-time for shader **/
            GLuint shader, prog;
            GLchar *shaderText = "... shader text ...";

            // Create ID for shader
           shader = glCreateShader(GL_VERTEX_SHADER);

           // Define shader text
           glShaderSource(shaderText);

           // Compile shader
           glCompileShader(shader);

           // Associate shader with program
           glAttachShader(prog, shader);

          // Link program
           glLinkProgram(prog);
    
           // Validate program
           glValidateProgram(prog);

           // Check the status of the compile/link
           glGetProgramiv(prog, GL_INFO_LOG_LENGTH, &logLen);
           if(logLen > 0)
           {
               // Show any errors as appropriate
               glGetProgramInfoLog(prog, logLen, &logLen, log);
               fprintf(stderr, "Prog Info Log: %s\n", log);
       }

     // Retrieve all uniform locations that are determined during link phase
           for(i = 0; i < uniformCt; i++)
           {
               uniformLoc[i] = glGetUniformLocation(prog, uniformName);
           }

           // Retrieve all attrib locations that are determined during link phase
           for(i = 0; i < attribCt; i++)
           {
               attribLoc[i] = glGetAttribLocation(prog, attribName);
           }

    /** Render stage for shaders **/
    glUseProgram(prog);

You should compile, link, and validate your programs when your application is initialized. Once you’ve created all your shaders, your application can efficiently switch between them by calling glUseProgram().

Respect the Hardware Limits on Shaders

OpenGL ES places limitations on the number of each variable type you can use in a vertex or fragment shader. Further, OpenGL ES implementations are not required to implement a software fallback when these limits are exceeded; instead, the shader simply fails to compile or link. Your application should validate all shaders to ensure that no errors occurred, as shown above in Listing 6-1.

Your application must query the limits of the OpenGL ES implementation and not use shaders that exceed these limitations. Your application should call glGetIntegerv() for each value at startup and choose shaders that match the capabilities of the OpenGL ES implementation.

Maximum number of vertex attributes

GL_MAX_VERTEX_ATTRIBS

Maximum number of uniform vertex vectors

GL_MAX_VERTEX_UNIFORM_VECTORS

Maximum number of uniform fragment vectors

GL_MAX_FRAGMENT_UNIFORM_VECTORS

Maximum number of varying vectors

GL_MAX_VARYING_VECTORS

For all types, the query returns the number of 4-component floating-point vectors available. Your variables are packed into these vectors as described in the OpenGL ES shading language specification.

Use Precision Hints

Precision hints were added to the GLSL ES language specification to address the need for compact shader variables that match the smaller hardware limits of embedded devices. Each shaders should specify a default precision, and individual shader variables should override this to provide hints to the compiler on how to efficiently compile your shader. Although these hints may be disregarded by an OpenGL ES implementation, they can also be used by the compiler to generate more efficient shaders.

Which precision hint to use for your shading variables depends on each variable’s requirements for range and precision. High-precision variables are interpreted as single precision floating-point values. Medium-precision variables are interpreted as half-precision floating-point values. Finally, low-precision qualified variables are interpreted with 8 bits of precision and a range of -2 to +2.

Important: The range limits defined by the precision hints are not enforced. You cannot assume your data is clamped to this range.

Although high precision is recommended for vertex data, other variables do not need this level of precision. For example, the fragment color assigned to the framebuffer can often be implemented in low precision without a significant loss in image quality, as demonstrated in Listing 6-2.

Listing 6-2  Low precision is acceptable for fragment color

default precision highp; // Default precision declaration is required in fragment shaders.
uniform lowp sampler2D sampler; // Texture2D() result is lowp.
varying lowp vec4 color;
varying vec2 texCoord;   // Uses default highp precision.
 
void main()
{
    gl_FragColor = color * texture2D(sampler, texCoord);
}

Start with high-precision variables and then reduce the precision on variables that do not need this range and precision, testing your application along the way to ensure that your program still runs correctly.

Be Cautious of Vector Operations

Not all operations are performed in parallel by the graphics processor. Although vector operations are useful, you should not overuse the vector processor.

For example, the code in Listing 6-3 takes two operations to complete on a SIMD vector processor, because of the parentheses, on a scalar processor, this would require eight separate operations.

Listing 6-3  Poor use of vector operators

highp float f0, f1;
highp vec4 v0, v1;
v0 = (v1 * f0) * f1;

The same operation can be performed more efficiently by shifting the parentheses as shown in Listing 6-4:

Listing 6-4  Proper use of vector operations

highp float f0, f1;
highp vec4 v0, v1;
// On a scalar processor, this requires only 5 operations.
v0 = v1 * (f0 * f1);

Similarly, if your application can specify a write mask for a vector operation, it should do so. A scalar processor can ignore unused components.

Listing 6-5  Specifying a write mask

highp vec4 v0;
highp vec4 v1;
highp vec4 v2;
// On a scalar processor, this may be twice as fast when the write mask is specified.
v2.xz = v0 * v1;

Use Uniform or Constants Instead of Computation Within a Shader

Whenever a value can be calculated outside the shader, you should pass it into the shader as a uniform or a constant. Calculating and using dynamic values can potentially be very expensive in a few circumstances.

Avoid Branching

Branches are discouraged within shaders, as they can reduce the ability to execute operations in parallel on 3D graphics processors. If your shaders must branch, it is more efficient to branch on a GLSL uniform variable or a constant known when the shader is compiled. Branching on a value computed in the shader can potentially be expensive. A better solution may be to create shaders specialized for specific rendering tasks. There’s a tradeoff between reducing the number of branches in your shaders and increasing the number of shaders you create. You should test different scenarios and choose the fastest solutions.

Eliminate Loops

You can eliminate many loops by either unrolling the loop or using vectors to perform operations. For example, this code is very inefficient:

// Loop
    int i;
    float f;
    vec4 v;
 
    for(i = 0; i < 4; i++)
        v[i] += f;

The same operation can be done directly using a component-wise add:

    float f;
    vec4 v;
    v += f;

When you cannot eliminate a loop, it is preferred that the loop have a constant limit to avoid dynamic branches.

Array Access

Using indices computed in the shader is more expensive than a constant or uniform array index.

Dynamic Texture Lookups

Also known as dependent texture reads, a dynamic texture lookup occurs when the shader computes or modifies texture coordinates used to sample a texture. Although GLSL supports this, it can incur a substantial performance penalty to do so. If the shader has no dependent texture read, the texture sampling hardware can fetch texels sooner and hide the latency of accessing memory.

Avoid Alpha Test and Discard

If your application uses an alpha test in OpenGL ES 1.1 or the discard instruction in an OpenGL ES 2.0 fragment shader, some hardware depth-buffer optimizations must be disabled. In particular, this may require a fragment’s color to be calculated completely before being discarded.

An alternative to using alpha test or discard to kill pixels is to use alpha blending with alpha forced to zero. This can be implemented by looking up an alpha value in a texture. This effectively eliminates any contribution to the framebuffer color while retaining the Z-buffer optimizations. This does change the value stored in the depth buffer.

If you need to use alpha testing or a discard instruction, you should draw these objects separately in the scene after processing any geometry that does not require it. Place the discard instruction early in the fragment shader to avoid performing calculations whose results are unused.




Last updated: 2010-07-09

Did this document help you? Yes It's good, but... Not helpful...