PRINT HINTS
Improving QuickDraw GX Printer Driver Performance
DAVE HERSEY
In this column, we go spelunking in the frost-covered caverns of QuickDraw GX. We'll discover how QuickDraw GX I/O buffering works and how to use that knowledge to squeeze optimal performance from a printer driver, whether PostScriptTM, raster, or vector. We'll also learn how to find (and avoid) the common bottlenecks.
Suppose you've been working on your first QuickDraw GX printer driver, and the big moment has arrived. Your printer's innards begin to whir and spin, and your heart beats a little faster. Your driver is actually printing! As you see that image being drawn on the page, your breathing quickens, and then . . . the printer stops. You run to your Macintosh to see if your driver has crashed (again), but no, not this time. A few seconds later the printer starts up again. And stops. And starts. This repeats until, several minutes later, the page is finished.
What's going on? Is your printer defective? Maybe. But then again, the problem may lie elsewhere. You probably have a data delivery problem on your hands. For one reason or another, the data isn't getting to the printer fast enough to keep it busy. To understand why, we need to look at what goes on behind the scenes when a driver tells QuickDraw GX to send data to your printer.
Your first reaction might be, "Ah, I need to implement some sort of asynchronous I/O to keep a steady stream of data going to my printer." That's a good thought, but QuickDraw GX already provides asynchronous I/O. Let's look a little deeper.
There are four QuickDraw GX printing messages that are used to implement buffering:
- GXBufferData -- sent to move data into an available buffer
- GXWriteData -- sent to write data to the printer immediately without buffering it first
- GXDumpBuffer -- sent to move a buffer full of data to the printer
- GXFreeBuffer -- sent to ensure that a buffer has been processed and is available for new data
How do you get GXBufferData and GXFreeBuffer to work asynchronously, so that the driver's data is sent to the printer as fast as possible? GXBufferData, in its default implementation, already works asynchronously. However, GXFreeBuffer has to work synchronously. Let's look at why.
In the following figures, assume that we have a driver with four buffers, and that at every time interval (a, b, c, and so on) half of a buffer can be filled by the driver. (In reality, the timeit takes to fill a buffer will vary as rendering time varies.)
First, let's say that the device can't process the data fast enough to empty out the first buffer before that buffer is needed again. Figure 1 shows what will happen. At the following time intervals shown in Figure 1, here's what takes place:
- None of the buffers have been used.
- The first buffer is being written to with GXBufferData.
- The first buffer has been filled, so QuickDraw GX sends GXDumpBuffer, which starts an asynchronous write of the data in buffer 1.
- The first buffer is pending I/O completion, and the driver begins filling the second buffer.
- The second buffer has been filled, so QuickDraw GX sends GXDumpBuffer for it. It can't be written, however, until the first buffer is finished writing.
- The first and second buffers are pending I/O completion, and the driver begins filling the third buffer.
- The third buffer has been filled, so QuickDraw GX sends GXDumpBuffer for it. We're still waiting for the first and second buffers to finish writing.
- The first through third buffers are pending I/O completion, and the driver begins filling the fourth buffer.
- The fourth buffer has been filled, so QuickDraw GX sends GXDumpBuffer for it, but it can't write until the first through third buffers finish.
- All buffers have writes pending. For the first buffer, QuickDraw GX sends GXFreeBuffer, which will wait for I/O to complete on that buffer before returning. GXFreeBuffer must behave synchronously, because its return signifies "This buffer can now be reused."
Figure 1. Device processes data very slowly
This is a worst-case scenario from the CPU's point of view. The device's communications pipe can't take the data fast enough to keep up with the buffering. Data buffering is delayed until pending writes are completed. There isn't any alternative -- you must free up a buffer in order to have a place to put the new data. Note that it may take several seconds before a buffer is freed. During this delay, the CPU sits idle, although it could be preparing more data.
Figure 2. Device processes data very quickly
Figure 3. Device and buffers are working optimally
Figure 2 shows another nonoptimal situation. The buffers are being filled and processed so quickly that at any given time, two -- or even three -- of the buffers aren't even being used. This is a waste of memory, and also increases the latency between buffers.
Figure 3 shows the ideal situation. This is what you should strive for, although it may not be attainable, depending on your device. In this case, there's always a buffer free. Data is buffered as fast as it's available and (with luck) is sent to the device as fast as the device can service it. In practice, this may be a difficult (if not impossible) scenario to achieve. In a moment, we'll see why. First, let's take a look at the resource that specifies the buffering parameters for a QuickDraw GX printer driver.
THE GXUNIVERSALIOPREFSTYPE RESOURCE
The gxUniversalIOPrefsType ('iobm') resource controls the behavior of the standard buffering and
device communication for QuickDraw GX printing. Here's what this resource looks like:
type gxUniversalIOPrefsType { longint standardIO = 0x00000000, customIO = 0x00000001; longint; // number of buffers to allocate longint; // size of each buffer longint; // number of I/O requests that // can be pending at once longint; // open/close timeout in ticks longint; // read/write timeout in ticks };
The first field in the resource specifies whether you're using QuickDraw GX's standard communications methods (like PAP or serial) or if you're going to provide custom device communications routines (to support SCSI or Centronics printers, for example). If you set this field to customIO, QuickDraw GX won't perform needless memory allocation or initialization to support the standard I/O routines.
The next field indicates the number of buffers you'd like QuickDraw GX to allocate for you (0 indicates none). In low-memory situations, fewer buffers than this number may be created.
Following the number of buffers is the size of each buffer, and then the intimidating "number of I/O requests that can be pending at once" field. A good value for this field is the number of buffers + 3. This represents the possibility of a pending write (or read) on each buffer, as well as a pending status, read, and close connection request.
The rest of the fields in this resource are used to set timeout thresholds.
If a driver doesn't include an 'iobm' resource, the system defaults to two 1K buffers and 10-second timeout values. Because every device is different, it's unlikely that the default options will be ideal for your printer.
DIFFERENCES BETWEEN IMAGING SYSTEMS
PostScript, raster, and vector drivers send differently formatted data to their devices, and this has an
effect on how you should set up your buffers.
PostScript drivers. PostScript drivers send text or binary data to their printers, and are generally connected via PAP (Printer Access Protocol). As it turns out, the low-level PAP driver in QuickDraw GX makes sure that no more than (512 * flow quantum) bytes are sent to your device at a time. The flow quantum (normally 8 for LaserWriters) is specified in your gxDeviceCommunicationsType ('comm') resource. So, if your PAP printer uses a flow quantum of 8, a maximum of only 4K will be sent to the printer at a time, even if your buffer size is 8K. This means that a buffer size of (256 * flow quantum) or (512 * flow quantum) usually works well for PAP devices.
Vector drivers. There are some distinct differences between vector drivers and other types of drivers:
- Vector drivers send text commands, but not in the quantity that their PostScript counterparts do. Vector devices tend to understand graphics commands that are only a few characters long but describe graphics that may take several seconds to plot. This is especially true for pen plotters and cutters.
- Because vector devices usually have very basic graphics primitives, operations such as clipping and converting text into polygons are often performed on the Macintosh before the data is sent to the plotter.
- Unlike most PostScript and some raster devices, vector devices rarely wait to start imaging until the entire page is received. It's therefore more efficient to begin the plot as soon as possible, and then send small chunks of data as quickly as possible.
As a result, vector drivers work best when they use several small buffers -- for example, buffers of 256 bytes each. This helps keep both the Macintosh and the printer busy.
Raster drivers. Raster drivers send bitmaps to their printers, often with control codes to skip over white areas in the image. The way you set up your buffers for raster drivers can have a dramatic effect on performance -- more so than for other types of drivers. The bitmap for a US Letter-sized page on a 24-bit, 300-dpi color device can require 24 megabytes of data. With that much data to process, your code has to be as efficient as possible. For raster drivers, your buffers should be at least the size of one (preferably two) maximum-sized scan lines for your device.
BUFFERING BOTTLENECKS
There are several things that can have an impact on the flow of data to your device. We'll discuss the
most common ones here.
The number of buffers specified in your 'iobm' resource. If you used only one buffer in your printer driver, you'd constantly hit the "pending write" lock-out situation described earlier. As soon as you finished filling the buffer, you'd have to wait for it to empty before buffering more data. You should therefore always have at least two buffers.
In an ideal situation, two buffers are all you'd need -- one would be always available for buffering while the other is sent to the device. However, you'd need a very fast device to manage this, as we'll soon see. In practice, three or four buffers is a good start for PostScript and raster drivers. For vector drivers, start with eight buffers.
The size of the buffers specified in your 'iobm' resource. As mentioned earlier, this is critical for vector and raster drivers. For vector devices, even moderate-sized buffers (2K) can cause your plotter to stall while data is being buffered, and your Macintosh to stall while that data is being plotted. Remember, a little vector data goes a long way. Start with 256-byte buffers.
If you're writing a raster driver using the default implementation of GXRasterDataIn, make sure that at least one worst-case scan line of data will fit in your buffers. (Keep in mind that your compression scheme might expand the data.) Your buffers must be this large because the gxDontSplitBuffer buffering option is used by the default implementation of GXRasterDataIn. If your buffer isn't big enough to hold an entire scan line, you'll get into an infinite loop as QuickDraw GX keeps rejecting buffers and asking for one that can hold all the data. There are two reasons for using the gxDontSplitBuffer option:
- It allows for some degree of error recovery. If data is sent to the printer, and the printer is off-line and discards the data, you can just repackage the same scan lines and resend the buffer. If scan lines are split across buffers, it's a little more work to keep track of what to send again.
- Some devices are modal in that they must be set to "graphics mode" before receiving graphics data, and set to other modes before receiving other types of data. Imagine that you split a buffer containing a "start graphics mode" command, followed by some graphics data, followed by an "end graphics mode" command. In between the two GXBufferData calls, the driver might want to query the device with GXWriteData. This could result in chaos or ignored requests because the printer is set to graphics mode and might not accept such queries.
Using the gxDontSplitBuffer option does mean that some portion of each buffer will probably be unfilled. If splitting the data between buffers isn't a problem for your device, override GXRasterDataIn and don't specify gxDontSplitBuffer when you buffer the data.
How big should your buffers be? As mentioned before, probably at least the size of two maximum- sized scan lines. In a minute, we'll see how you can tune your buffer size.
How fast QuickDraw GX can prepare data. It's going to take QuickDraw GX time to prepare the data that it hands your driver. For raster drivers, make sure that your gxRasterPrefsType ('rdip') resource is set up to ask only for the data that you need. Don't make QuickDraw GX spend any more time or pass more data than it needs to.
Time hits from postprocessing. This applies to drivers that do their own halftoning and the like. Can you gain significantly by doing your own halftoning? It's possible, but keep in mind that QuickDraw GX offers a wide range of halftoning and dithering options, and using these methods is likely to take a similar amount of time as just passing your driver the raw data and having it halftone that.
The throughput of the communications pipe. Your device might want to process data faster than the computer sends it due to hardware constraints of, for example, the serial port.
How fast the device can receive data. Similarly, the device itself might be the bottleneck. Keep in mind that the speed the manufacturer claims may not refer to using the printer for printing graphics. Graphic images usually take longer to process than text. The Macintosh (with some minor irrelevant exceptions) prints in graphics mode only, so the claimed rate may not be realistic.
WHICH BOTTLENECKS AFFECT YOU?
Before you can improve the performance of your printer driver, you have to find your bottlenecks.
Here are some tests that help determine where your bottlenecks are.
How long does it take QuickDraw GX to prepare data? If you're writing a raster driver, implement a GXRasterDataIn override that does nothing but return noErr. For PostScript or vector drivers, do the same thing in a message override for GXBufferData orGXVectorPackageShape, respectively. If your PostScriptor vector driver renders some shapes on its own, you should also override GXPostScriptProcessShape or GXVectorVectorizeShape. In this override, simply forward the message unless you're passed a shape that your driver will render itself. In that case, don't forward the message; just return noErr. This way, your calculations won't include time spent rendering shapes that your driver will be handling completely on its own.
Next, print a typical several-page document and see how many pages per minute you get. If this is slower than the device can print, you might want QuickDraw GX to create an image file of the data before sending it.
Calculating pages per minute is easy. Suppose your "typical" 4-page document takes 72 seconds to render. Then (72 seconds ÷ 4 pages) = 18 seconds per page and (60 seconds ÷ 18 seconds per page) = 3.3 pages per minute. *
To create an image file, override GXCreateImageFile and forward the message along with a combination of the image file options (such as "gxMakeImageFile | gxEntireFile"). There are optionsfor creating image files for each plane, each page, or both. For details, see the QuickDraw GX interface file PrintingMessages.h.
If you use the debugging version of QuickDraw GX, rendering is slower. For accurate benchmarks, use the nondebugging QuickDraw GX extension for timing tests. *
How long is your code taking to postprocess data? Do the same thing as you just did, but include any of your own code (for halftoning, compressing, or whatever) that you normally execute. Compare this to the rate you got from the last test to see how your code is affecting rendering time. Again, an image file might be an option if this is a problem. Also, consider using QuickDraw GX's built-in halftoning and dithering instead of your own.
How fast does the device want data? Suppose your device is a two-page-per-minute, 300-dpi, 4-bit device with a maximum page size of 8 by 10 inches. Some quick arithmetic (see "Calculating Device Data Requirements") tells you that you need over 7 megabytes of data per minute, though you can reduce this requirement substantially with compression.
There's another way to determine whether the communications speed is too low: Make your driver roll everything into an image file before sending anything to the printer. Then, print a typical document and see if the printer stays busy once it starts receiving data. If not, the data isn't being sent to your device fast enough. There's not much you can do about this except reduce the amount of data you send or redesign the hardware.
Finally, don't package white space and send it to your device if the device supports skipping it. The GXRasterDataIn message passes a rectangle that indicates where the nonwhite scan lines are in a given band. If you don't skip over the white space on a page, you're wasting time packaging and sending useless data.
Is the buffer usage optimal? Whenever you send GXBufferData, first send GXFreeBuffer. Check to see if GXFreeBuffer returns immediately. If it doesn't, the buffering is being blocked by a pending write. An alternate approach is to implement an override for GXFreeBuffer that subtracts the tick count determined before calling Forward_GXFreeBuffer from the tick count when the call returns. You could record this in a file and look at the information after a print job finishes. Large values indicate that your driver is blocked while waiting for a free buffer.
Try increasing your buffer size or adding more buffers until the lock-out goes away. Note that if your device isn't fast enough, you may never (with reasonable buffer allocation) reach a state in which you never have to wait. Your device (or the communications pipe) might be so slow that the only way to keep a buffer free is to allocate enough buffers to hold the entire page. That's what I would consider unreasonable buffer allocation. However, if you can reach this state of always having a buffer free, back off on the number of buffers or buffer size slightly so that you begin to get occasional lock- outs again. This is your optimal buffer configuration.
EYES TO THE FRONT, DRIVER
Now that you can optimize your QuickDraw GX buffering and printing, you can avoid the sporadic
printing that so many driver writers fall prey to. Your drivers will have the printers humming steadily
along, your users will be pleased, and other driver developers will stand in awe of you.
CALCULATING DEVICE DATA REQUIREMENTS
A two-page-per-minute, 300-dpi, 4-bit device with a maximum page size of 8 by 10 inches requires (300 x 300 x 4) ÷ 8 bits per byte = 360,000 bits per square inch, or a little under 44K bytes per square inch. The entire page requires (45,000 x 8 x 10) = 3,600,000 bytes per page or about 3.5 megabytes per page. To achieve the device's maximum two-page-per-minute throughput rate, you need to pass twice this amount, or over 7 megabytes of data per minute.
Now, suppose you use compression and also remove beginning-of-line and interline white space to reduce a typical page to, say, 25% of its raw size. Then you're looking at 7 x .25 or about 1.8 megabytes per minute. That's still about 29K bytes per second or about 300 Kbaud to satisfy this device. This can still be a problem if your interface is running at only 9600 baud.
DAVE HERSEY (AppleLink HERSEY) is known to small relatives as "Uncle Mommy." He spent the last three years working
with QuickDraw GX and helping developers learn its wily ways. In his spare time, Dave helps his nephews and niece hunt
for buried pirate treasure on Joe's Island in Wayne, Maine. *
The best reference for writing QuickDraw GX printer drivers is Inside Macintosh: QuickDraw GX Printing Extensions and Drivers .*
Thanks to our technical reviewers Hugo Ayala, Tom Dowdy, Daniel Lipton, and Harita Patel. *
- SPREAD THE WORD:
- Slashdot
- Digg
- Del.icio.us
- Newsvine