Verify that no unusual activity is taking place on your system while you take timing measurements. Other graphics programs, background processes, and network activity can distort timing results. For example, do not have osview, gr_osview, or Xclock running while you are benchmarking.
Use a high-resolution clock and make measurements over a period of time that's at least one hundred times the clock resolution. A good rule of thumb is to benchmark something that takes at least two seconds so that the uncertainty contributed by the clock reading is less than one percent of the total error. To measure something that's faster, write a loop to execute the test code repeatedly.
Note: Loops like this for timing measurements are highly recommended. Be sure to structure your program in a way that facilitates this approach.
gettimeofday() provides a convenient interface to IRIX clocks with enough resolution to measure graphics performance over several frames. Call syssgi() with SGI_QUERY_CYCLECNTR for high-resolution timers. If you can repeat the drawing to make a loop that takes ten seconds or so, a stopwatch works fine and you don't need to alter your program to run the test.
Verify that the code you are timing behaves identically for each frame of a given timing trial. If the scene changes, the current bottleneck in the graphics pipeline may change, making your timing measurements meaningless. For example, if you are benchmarking the drawing of a rotating airplane, choose a single frame and draw it repeatedly, instead of letting the airplane rotate and taking the benchmark. Once a single frame has been analyzed and tuned, look at frames that stress the graphics pipeline in different ways, then analyze and tune them individually.
Run your program multiple times and try to understand variance in the trials. Variance may be due to other programs running, system activity, prior memory placement, or other factors.
Graphics calls can be tricky to benchmark because they do all their work in the graphics pipeline. When a program running on the main CPU issues a graphics command, the command is put into a hardware queue in the graphics subsystem, to be processed as soon as the graphics pipeline is ready. The CPU can immediately do other work, including issuing more graphics commands until the queue fills up.
When benchmarking a piece of graphics code, you must include in your measurements the time it takes to process all the work left in the queue after the last graphics call. Call glFinish() at the end of your timing trial, just before sampling the clock. Also call glFinish() before sampling the clock and starting the trial, to ensure no graphics calls remain in the graphics queue ahead of the process you are timing.
Because buffers can only be swapped during a vertical retrace, there is a period, between the time a glXSwapBuffers() call is issued and the next vertical retrace, when a program may not execute any graphics calls. A program that attempts to issue graphics calls during this period is put to sleep until the next vertical retrace. This distorts the accuracy of the timing measurement.
Note: To get accurate numbers, you must perform timing trials in single-buffer mode, with no calls to glXSwapBuffers().
When making timing measurements, use glFinish() to ensure that all pixels have been drawn before measuring the elapsed time.