To improve performance in the imaging pipeline, follow these guidelines:
The following code fragment uses the green component as the data source and writes the result of the operation into some (possibly all) of the other components:
/* Matrix is in column major order */
GLfloat smearGreenMat[16] = {
0, 0, 0, 0,
1, 1, 1, 1,
0, 0, 0, 0,
0, 0, 0, 0,
};
/* The variables update R/G/B/A indicate whether the
* corresponding component would be updated.
*/
GLboolean updateR, updateG, updateB, updateA;
...
/* Check for availability of the color matrix extension */
/* Set proper color matrix and mask */
glMatrixMode(GL_COLOR);
glLoadMatrixf(smearGreenMat);
glColorMask(updateR, updateG, updateB, updateA);
/* Perform the imaging operation */
glEnable(GL_SEPARABLE_2D_EXT);
glCopyTexSubImage2DEXT(...);
/* Restore an identity color matrix. Not needed when the same
* smear operation is to used over and over
*/
glLoadIdentity();
/* Restore previous matrix mode (assuming it is modelview) */
glMatrixMode(GL_MODELVIEW);
...
When using the color matrix to broadcast one component into all others, avoid manipulating the color matrix with transformation calls such as glRotate(). Instead, load the matrix explicitly using glLoadMatrix().
Similar to polygon drawing, there can be a pixel-drawing bottleneck due to overload in host bandwidth, processing, or rasterizing. When all modes are off, the path is most likely limited by host bandwidth, and a wise choice of host pixel format and type pays off tremendously. This is also why byte components are sometimes faster. For example, use packed pixel format GL_RGB5_A1_EXT to load texture with an GL_RGB5_A1_EXT internal format.
When either many processing modes or a several expensive modes such as convolution are on, the processing stage is the bottleneck. Such cases benefit from one-component processing, which is much faster than multicomponent processing.
Zooming up pixels may create a raster bottleneck.