Implementing Threaded IO on the Mac OS

© 1997 by Jonathan "Wolfie" Rentzsch

Grab the source code.

Abstract

This paper explains input/output (IO) on the Mac OS. After detailing the two IO models, the paper provides an explanation of how two match the Thread Manager with Mac OS IO with three examples. This paper finally introduces a new method along with the code behind it.

Introduction

Casey would waltz with the strawberry blonde
And the band played on
He'd glide 'cross the floor with the girl he adored
And the band played on
But his brain was so loaded it nearly exploded
The poor girl was filled with alarm
He married the girl with the strawberry curls
And the band played on

You've heard about the Thread Manager, the Mac OS's implementation of cooperative threading. You've read the develop articles. You've downloaded Inside Macintosh:Threads. You've seen the sample code. Now you want to build software to take advantage of this technology.

Threading really shines when your software spins off a lengthy task and returns control to the user immediately. Instead of having to wait for your software to complete the command, your user is free to continue working.

One of the main bottlenecks that software faces isn't computational speed, it's input/output (IO) speed. Since IO tends to take so long, it's an ideal candidate for threading.

In this paper I'll cover:

IO Overview

IO is by definition the act of moving data from an IO device to RAM (input) or from RAM to an IO device (output). Input also goes by the less formal name "read" while output goes by "write."

Common examples of IO devices are:

So your hard drive, modem and keyboard all work towards the same noble goal of blasting bits to and from RAM. Humbling, isn't it?

The Mac OS IO Programming Interface

All this hardware stuff is fine and dandy, you say, but I'm a software guy. How do I code this stuff? I'm glad you asked, otherwise this paper would be rather short.

Like most other operating systems, the Mac OS divvies up the task of managing IO devices. Sitting right above the hardware are chunks of code called drivers. Their job is to provide a software interface for the hardware. By abstracting the hardware through drivers, you don't have a bunch of software touching the hardware willy-nilly -- all access goes through one channel. If the hardware changes, only the driver needs to be rewritten.

In order to manage these drivers and the devices they control, Apple devised the Device Manager. Your application uses the Device Manager to handle IO -- you rarely talk to drivers directly.


the layering of the MacOS IO programming interface

To understand Mac OS IO is to understand the Device Manager. Fortunately, IO isn't a complex topic, and neither is the Device Manager. In fact, the entire Device Manager programming interface is just a few variations on seven basic commands: Open, Close, Read, Write, Control, Status and KillIO.

The Open command is used to open a connection to a driver. To be nice, make sure you call Close when you're done using the device.

The Read command is used to move data from the device into RAM. The Write command is used to move data from RAM to the device. These are the meat of the Mac OS IO programming interface.

Control is used when issuing command not directly related to pumping data. Changing a serial port's speed, for example.

Status is the flip side of Control, you can use it to get a serial port's speed.

KillIO has a special purpose that we'll get into momentarily.

To execute an IO action, you create an IO job. A job is simply a description you pass to the Device Manager of the IO task you'd like accomplished. Some information included in an IO job are the source of the data to transfer, the destination and the size of the transfer.

There are two models for executing IO jobs: synchronous and asynchronous. In a nutshell, the synchronous model is easy to code but locks up your Macintosh until the IO job completes. The asynchronous model is more difficult to code but doesn't lock up your Macintosh until the IO job completes.

Fortunately, when you combine an asynchronous model with the Thread Manager, you get a new model, threaded IO. Threaded IO combines the synchronous model's ease of use with the asynchronous model's parallelism. A worthy goal indeed.

The Synchronous Model

The synchronous model for executing IO jobs is easy to code. Each of the Device Manager commands (Open, Close, Read, Write, Control, Status and KillIO ) are represented by one function call.

It could scarcely be easier to use the synchronous model -- the IO job is specified in the parameters of each function. Let's look at the function prototypes:

OSErr    OpenDriver( ConstStr255Param name, short *drvrRefNum );
 
OSErr    CloseDriver( short refNum );
OSErr    FSRead( short refNum, long *count, void *buffPtr );
 
OSErr    FSWrite( short refNum, long *count, const void *buffPtr );
OSErr    Control( short refNum, short csCode, const void *csParamPtr );
 
OSErr    Status( short refNum, short csCode, void *csParamPtr );
OSErr    KillIO( short refNum );

Everything seems in order here. When you want to use a driver, you call OpenDriver() specifying the name the the driver in question. If all goes well, you get a reference number passed back in drvrRefNum. A reference number is a unique ID you use when referring to an open driver. Notice every other function takes a variable named refNum.

Once you're done with the driver, call CloseDriver() with the aforementioned reference number that OpenDriver() gave you.

You use FSRead() to read. You pass it the omnipresent reference number, the size of the job (count) and where in RAM to put the read data (buffPtr).

FSWrite() is just like FSRead() except buffPtr now points where to get the data to write out instead of where to put the data.

When you want to pass a Control message to a driver, you call Control() with constant in csCode that maps to the message you're passing. For example, the change serial speed message constant is 13 (serdSetBaud), so we'd set csCode to 13 to change the serial port's speed.

The csParamPtr argument for Control() is where you stick the information relevant to the Control message. In the serial speed scenario, we'd set csParamPtr to point to a short that tells that Serial Driver what speed to set the port.

Status() is Control()'s mirror twin. Use it to get information from the driver. csCode and csParamPtr work the same way as with Control() except the information is now outgoing instead of incoming.

KillIO() deals with asynchronous IO -- we'll talk about it then.

A Synchronous IO Example

To illustrate the various models (synchronous, asynchronous, threaded), we'll code the same simple task to each model. The simple task is to write the 4 byte string ATZ\r to the modem port. For those of you who don't know, ATZ\r is the modem reset command in the Hayes' AT command set. Assuming a modem is attached to the modem port, the modem will reset itself.

Before we get into the code, let me note a Serial Driver quirk. Each serial port is controlled not by one but by two separate drivers: an input driver and an output driver.This separation is a work-around for a Device Manager constraint.

There's a few things to remember. One, the output driver is the dominant driver. Open it first, close it last, send all Write, Control and Status commands to it. Two, the Read command should only be directed to the input driver. Three, only the still-mysterious KillIO command can be directed to both the input and output drivers.

Here's a function that uses the synchronous model to execute our sample IO job:

OSErr    SynchronousModemReset()
{
    Str255   resetCmd = "\pATZ\r";
    short    inRefNum = 0, outRefNum = 0;
    long     count = resetCmd[ 0 ];
    OSErr    err;
    
    /*    Attempt to open the modem serial port */
    err = OpenDriver( "\p.AOut", &outRefNum; );
    if( !err )
        err = OpenDriver( "\p.AIn", &inRefNum; );
    
    /*    Write the modem reset command using the synchronous model */
    if( !err )
        err = FSWrite( outRefNum, &count, resetCmd + 1 );
    
    /*    Call the test function */
    if( !err )
        Foo();
    
    /*    If we successfully opened the modem serial port, close it now */
    if( inRefNum ) {
        (void) CloseDriver( inRefNum );
        inRefNum = 0;
    }
    if( outRefNum ) {
        (void) CloseDriver( outRefNum );
        outRefNum = 0;
    }
    
    return( err );
}

First we initialize five variables: resetCmd, inRefNum, outRefNum, count and err. resetCmd holds a Pascal string containing the ATZ\r command. inRefNum and outRefNum will hold the input driver's reference number and output driver's reference number, respectively. Until then, we initially set them to zero. We do this to mark the reference number as invalid. Bad things happen if we attempt to use an invalid reference number. count holds the size of the IO job. Finally, err holds the error code.

First we open the output driver and snatch its reference number. If that works, then we open the input driver.

If we were able to open both drivers then we write the modem reset command string out the modem port. Foo() will then be called once FSWrite() successfully returns.

We're all done here, now we make sure the input reference number is valid before charging off to close the driver. We ignore the error code returned by CloseDriver(), because there's nothing we could do about it if it failed.

Note we invalidate inRefNum by setting it to zero after we're done with it. This is a good precautionary measure to take. We then close the driver with the outRefNum reference number.

Synchronous IO Drawbacks

Synchronous IO has two drawbacks. Your computer is effectively frozen while the IO job completes. Interrupts are still handled, however anything depending on WaitNextEvent() is cut off. It's as if one process is hogging the processor. This is a bad thing.

The second drawback stems from the first: there's no way of handling timeouts. Our modem reset command is a good example. What if the modem isn't connected to the modem port when we synchronously write ATZ\r to it? We wait forever for the IO job to complete &emdash; hanging the computer. Unfortunately the only way to discover if something is plugged in is to blindly write to the port.

The way out of this is to write our modem reset command and wait maybe 7 seconds. If we didn't execute our IO job by then, it's an indicator that nothing is attached to the serial port.

The Asynchronous Model

The asynchronous model is a low-level programming interface -- you have to do extra work to use it, but it's more flexible.

Whereas you specify your IO job in the synchronous model's function parameters, you specify your IO job in parameter blocks when using the asynchronous model. A parameter block is simply a struct. When calling a function that uses a parameter block, you pass along a parameter block's address as the argument.

Specifying IO jobs in a parameters block is complex and error-prone, but you gain three advantages. First, when dealing with this many parameters, it's difficult to fit them all into a function's argument list. Second, it's easy to extend the structure to add your own fields. Third, and most important, by giving the parameter block its own chunk of memory, you can make it queueable.

Let me clear up that last statement. The Mac OS has a set of utilities named the Queue utilities. The Queue utilities are functions that maintain linked lists. A linked list with a little extra information tied to it is called a queue.

Each driver has a job queue associated with it. A job queue is a linked list of parameter blocks. When you execute an IO job asynchronously, the Device Manager places the parameter block at the end of that driver's job queue instead of executing it immediately (like the synchronous model does). The Device Manager immediately returns control to your software.

Using interrupts, the driver completes the IO jobs in its job queue. Seemingly in parallel your IO job completes and is retired.

Now is a good time to fill you in on KillIO. Since you are given back control immediately after executing an asynchronous IO job, you may find yourself wanting to stop an IO job thats pending or currently executing. That's what KillIO does -- it removes each pending IO job from the job queue and halts the current job. It's great for stopping run away IO jobs like our modem reset command string.

Here's the asynchronous programming interface's function prototypes:

OSErr    PBOpenAsync( ParmBlkPtr paramBlock );
OSErr    PBCloseAsync( ParmBlkPtr paramBlock );
 
OSErr    PBReadAsync( ParmBlkPtr paramBlock );
OSErr    PBWriteAsync( ParmBlkPtr paramBlock );
 
OSErr    PBControlAsync( ParmBlkPtr paramBlock );
OSErr    PBStatusAsync( ParmBlkPtr paramBlock );
 
OSErr    PBKillIOAsync( ParmBlkPtr paramBlock );

The basic commands are all here: Open, Close, Read, Write, Control, Status and KillIO. Drivers can't be opened, closed or killed asynchronously, so that just leaves us with PBReadAsync(), PBWriteAsync(), PBControlAsync() and PBStatusAsync().

All the functions take the same argument type: ParmBlkPtr. Inside Macintosh:Devices tells me that ParmBlkPtr is a pointer to a ParamBlockRec union:

union ParamBlockRec {
    IOParam          ioParam;
    FileParam        fileParam;
    VolumeParam      volumeParam;
    CntrlParam       cntrlParam;
    SlotDevParam     slotDevParam;
    MultiDevParam    multiDevParam;
};

The various fields are used for different purposes depending on what drivers you're working with. ioParam is for transport drivers like the Serial Driver. fileParam is used for the File Manager. volumeParam is used for managing storage volumes like floppies, hard drives, CDs, etc. cntrlParam is used for controlling drivers themselves. We're most interested in the ioParam field and thus the IOParam structure:

struct IOParam {
    QElemPtr           qLink;
    short              qType;
    short              ioTrap;
    Ptr                ioCmdAddr;
    IOCompletionUPP    ioCompletion;
    OSErr              ioResult;
    StringPtr          ioNamePtr;
    short              ioVRefNum;
    short              ioRefNum;
    SInt8              ioVersNum;
    SInt8              ioPermssn;
    Ptr                ioMisc;
    Ptr                ioBuffer;
    long               ioReqCount;
    long               ioActCount;
    short              ioPosMode;
    long               ioPosOffset;
};

The first two fields, qLink and qType, are used by the Queue Manager. The next two fields, ioTrap and ioCmdAddr, are used internally by the Device Manager. I wouldn't mess with them.

ioCompletion is an important field. When the IO job is completed, the Device Manager calls the function pointer in ioCompletion if the field is not nil. The user-supplied function to be called when the job is completed is called a completion routine. Completion routines may be executed at interrupt time and are subject to interrupt-time restrictions. They can not use the Memory Manager, unlocked handles, QuickDraw, etc.

When the parameter block is successfully placed into the job queue, the ioResult field is set to 1. When the IO job is completed, ioResult holds either 0 (noErr) or a negative error code. You can use this knowledge to test if an IO job is completed. If ioResult is less than 1, the IO job is finished.

We can safely ignore ioNamePtr, ioVRefNum, ioVersNum, ioPermssn and ioMisc for now. Read Inside Macintosh:Devices for these details.

ioRefNum holds the much ballyhooed driver reference number. ioBuffer points to the place to put the data if reading or the place to get the data if writing. You fill in ioReqCount with the transfer size you'd like -- ioActCount tells you what you actually have. Finally, you set ioPosMode to the positioning mode (from the start, from the end, from the mark, etc) and ioPosOffset is where to find the data when reading or where to place the data when writing.

An Asynchronous IO Example

Now we'll code the modem reset command using the asynchronous model.

OSErr    AsynchronousModemReset()
{
    Str255           resetCmd = "\pATZ\r";
    short            inRefNum = 0, outRefNum = 0;
    ParamBlockRec    pb;
    OSErr            err;
    
    /*    Attempt to open the modem serial port */
    err = OpenDriver( "\p.AOut", &outRefNum; );
    if( !err )
        err = OpenDriver( "\p.AIn", &inRefNum; );
    
    /*    Write the modem reset command using the asynchronous model */
    if( !err ) {
        pb.ioParam.ioCompletion = nil;
        pb.ioParam.ioRefNum = outRefNum;
        pb.ioParam.ioBuffer = (Ptr) resetCmd + 1;
        pb.ioParam.ioReqCount = resetCmd[ 0 ];
        pb.ioParam.ioPosMode = fsFromStart;
        pb.ioParam.ioPosOffset = 0;
        
        err = PBWriteAsync( &pb; );
    }
    
    /*    Call the test function */
    if( !err )
        Foo();
    
    /*    Wait until the asynchronous job completes */
    if( !err )
        while( pb.ioParam.ioResult > noErr ) {}
    
    /*    If we successfully opened the modem serial port, close it now */
    if( inRefNum ) {
        (void) CloseDriver( inRefNum );
        inRefNum = 0;
    }
    if( outRefNum ) {
        (void) CloseDriver( outRefNum );
        outRefNum = 0;
    }
    
    return( err );
}

The driver opening code and driver closing code is directly swiped from SynchronousModemReset(). We introduce the parameter block here, pb. We initialize a total of six fields in the parameter block before calling PBWriteAsync().

Unlike with SynchronousModemReset(), Foo() will now possibly be called before the IO job is completed. If we wanted to make sure the IO job is finished before calling Foo(), we could move it after the while loop.

Speaking of which, the while loop takes advantage of the state of ioResult to determine if the IO job is done yet. It does nothing while waiting, but you could easily slip some code in that does some work.

Enter the Thread Manager

While you're waiting for IO to complete, you'd like to get some other work done. Apple answered our desires to have a general task sharing mechanism by creating the Thread Manager.

Asynchronous IO and the Thread Manager sound like they go together like peanut butter and chocolate. Imagine you spawn a download thread. While your download thread waits for the slow modem, it gives time to other threads.

The Ideal Threaded IO Model

Ideally, you'd only need to add two lines of code to enable your asynchronous IO code take advantage of the Thread Manager.

OSErr    IdealThreadedModemReset()
{
    Str255           resetCmd = "\pATZ\r";
    short            inRefNum = 0, outRefNum = 0;
    ParamBlockRec    pb;
    OSErr            err;
    
    /*    Attempt to open the modem serial port */
    err = OpenDriver( "\p.AOut", &outRefNum; );
    if( !err )
        err = OpenDriver( "\p.AIn", &inRefNum; );
    
    /*    Write the modem reset command using the asynchronous model */
    if( !err ) {
        pb.ioParam.ioCompletion = NewIOCompletionProc( WakeUpCompletionRoutine );
        pb.ioParam.ioRefNum = outRefNum;
        pb.ioParam.ioBuffer = (Ptr) resetCmd + 1;
        pb.ioParam.ioReqCount = resetCmd[ 0 ];
        pb.ioParam.ioPosMode = fsFromStart;
        pb.ioParam.ioPosOffset = 0;
        
        err = PBWriteAsync( &pb; );
    }
    
    /*    Sleep until WakeUpCompletionRoutine fires and wakes us up */
    if( !err )
        SetThreadState( kCurrentThreadID, kStoppedThreadState, kNoThreadID );
    
    /*    Call the test function */
    if( !err )
        Foo();
    
    /*    If we successfully opened the modem serial port, close it now */
    if( inRefNum ) {
        (void) CloseDriver( inRefNum );
        inRefNum = 0;
    }
    if( outRefNum ) {
        (void) CloseDriver( outRefNum );
        outRefNum = 0;
    }
    
    return( err );
}

Wouldn't be great if after you execute the asynchronous PBWriteAsync(), you could stop the thread and depend on the completion routine to reawaken the thread?

It would be nice -- but you can't.

The Window of Death

Between when you call PBWrite() and you call SetThreadState(), the IO job can and will complete, executing our IO job.

Ideally, the execution path taken is like this:

However, this path of execution is possible

Your thread is stopped and will never be readied. Your thread is dead!

develop's Coping Mechanism

develop, Apple's Technical Journal, had an article on the Thread Manager. They advocated a dual thread solution.

There's two threads per IO job: the IO thread and the waker thread. Here's its execution path:

This is a poor work-around. You have to manage two threads per IO job and the scheduling overhead is too great.

PowerPlant's Coping Mechanism

PowerPlant, Metrowerk's C++ framework, defers the completion routine.

PowerPlant uses the ideal threaded IO model with a twist. Instead of the completion routine blindly attempting to ready the thread, it checks to see if the thread is really stopped. If it's not, then it sets a Time Manager task to execute 100 microseconds in the future. Hopefully by then the thread will be stopped.

This is a good work-around, however it complicates the completion routine.

The Polling Coping Mechanism

With the polling coping mechanism, the thread is never stopped. After executing the IO job, the thread simply polls ioResult until it's less than one, yielding all the while.

Surprisingly, due to the scheduling overhead, this method is as fast as PowerPlant's and doesn't require a completion routine. This is the best work-around. However, it is still a work-around and polling is inelegant &emdash; we want a solution.

Problems with the Coping Mechanisms

By now you realize that the Thread Manager wasn't designed with IO in mind. We should be able to use the ideal thread model.

The latency of the work-arounds is too high. Imagine your application has 25 threads running. The IO thread executes an IO job and yields. Even if the IO job completes immediately, the IO thread will have to wait behind the 24 other threads before it runs again. And one of those threads is your event loop, which may switch out your application.

Extending the Thread Manager for Effective Threaded IO

Metaphysical question: what does it mean to stop a thread?

The Thread Manager thinks it means to mark a thread as ineligible for scheduling and schedule another thread.

My solution: Write a function that marks a thread as ineligible for scheduling but doesn't reschedule. This would put a thread into a known state before executing the IO job.

However, latency would still be high. When the IO job is completed we'd like our thread to be first in line. We'll also add the ability to mark a thread as "priority."

That's great! How do we do it?

Creating a Thread Queue

The Thread Manager provides a hook where you can install your own scheduler. However, the Thread Manager's data structures are completely opaque -- there's no "thread queue" to access from our scheduler. You can't even access a reference constant given a ThreadID!

Even if we did install a custom scheduler, we wouldn't know what to schedule!

However there is a way -- create and maintain your own thread queue. The Thread Manager provides three hooks meant for debugging: DebuggerNotifyNewThread(), DebuggerNotifyDisposeThread() and DebuggerNotifyScheduler(). We'll plug into these hooks to maintain three thread queues: an ineligible queue, an eligible queue and a priority queue.

Maintaining the Thread Queues

When our DebuggerNotifyNewThread() hook is called, we'll add an element to the eligible queue with the new thread's ID.

When our DebuggerNotifyDisposeThread() hook is called, we'll search our queues to find the element with a matching thread ID and remove it.

Finally, when our DebuggerNotifyScheduler() hook is called, we'll look at our priority queue. If there's a priority thread waiting we'll move it to the eligible queue and schedule it. Priority status should be fleeting -- otherwise it will hog the processor. If there isn't a priority thread waiting, we'll just schedule the next thread in the eligible queue.

The Thread Queue Code

I've defined the XThreadElem structure to hold individual thread elements:

struct    XThreadElem {
    XThreadElemPtr     next;
    XThreadQueuePtr    queue;
    ThreadID           threadID;
};

next points to the next element in the queue. queue points to this element's owner while threadID holds (surprise!) the element's thread ID.

We'll store all three queues (ineligible, eligible and priority) in one handle as an array of XThreadElem structures. We'll use the standard Mac OS Queue Utilities to manage them. We'll keep the handle locked because the Thread Queue routine will be called at interrupt time.

Now we need a queue header. A queue header stores important information like the first element in the queue and the last element:

struct XThreadQueue {
    short             type;
    XThreadElemPtr    head;
    XThreadElemPtr    tail;
    XThreadElemPtr    mark;
};

The type field is there for Queue Utilities compatibility -- we don't use it. The mark field points to the next thread to schedule.

I can't reprint all the Thread Queue code here -- look at the included code if you're interested.

The Extended Thread Manager Programming Interface

In all, I define three extended Thread Manager calls:

OSErr           InitXThreads();
XThreadState    GetXThreadState( ThreadID threadID );
OSErr           SetXThreadState( ThreadID threadID, XThreadState state );

Call InitXThreads() once before calling any of the other extended Thread Manager calls. It allocates XThreadElem array and installs the Thread Manager debugging callbacks.

GetXThreadState() works like the Thread Manager's GetThreadState() except returns one of three constants:

enum {
    kXThreadIneligible = 0,
    kXThreadEligible,
    kXThreadPriority
};

SetXThreadState() works like the Thread Manager's SetThreadState() except it takes the extended Thread Manager constants and doesn't reschedule.

The Threaded IO Programming Interface

Now is when the rubber meets the road. We've extended the Thread Manager cleanly. Now we want to merge synchronous IO with the extended Thread Manager to give us easy-to-code high-performance IO.

Witness two new functions:

OSErr    ThreadedRead( short refNum, void *buffer, long *size, long offset, long patience );
OSErr    ThreadedWrite( short refNum, void *buffer, long *size, long offset, long patience );

ThreadedRead() and ThreadedWrite() are descendants of FSRead() and FSWrite(). They're more powerful, so follow along.

refNum is the standard reference number, buffer points to where to get the data or put the data. Set size to the size of the IO job -- after the IO job is done size will be set to the actual number of bytes transferred. You specify where you want to read from or write to in offset. Finally, specify how long you're willing to wait in milliseconds in patience. One thousand milliseconds is equal to one second.

Enjoy!

All the code is included with this paper, hunt around and enjoy. I'm storing this paper at my web site and will continue to update it and the code. You can find it at: <http://www.u-s-x.com/wolfie/rants/andthebandplayedon.html>.

Bibliography

Apple Computer. Inside Macintosh:Devices. Addison Wesley, Reading, Massachusetts. 1994.