This paper explains input/output (IO) on the Mac OS. After detailing the two IO models, the paper provides an explanation of how two match the Thread Manager with Mac OS IO with three examples. This paper finally introduces a new method along with the code behind it.
You've heard about the Thread Manager, the Mac OS's implementation of cooperative threading. You've read the develop articles. You've downloaded Inside Macintosh:Threads. You've seen the sample code. Now you want to build software to take advantage of this technology.
Threading really shines when your software spins off a lengthy task and returns control to the user immediately. Instead of having to wait for your software to complete the command, your user is free to continue working.
One of the main bottlenecks that software faces isn't computational speed, it's input/output (IO) speed. Since IO tends to take so long, it's an ideal candidate for threading.
In this paper I'll cover:
IO is by definition the act of moving data from an IO device to RAM (input) or from RAM to an IO device (output). Input also goes by the less formal name "read" while output goes by "write."
Common examples of IO devices are:
So your hard drive, modem and keyboard all work towards the same noble goal of blasting bits to and from RAM. Humbling, isn't it?
All this hardware stuff is fine and dandy, you say, but I'm a software guy. How do I code this stuff? I'm glad you asked, otherwise this paper would be rather short.
Like most other operating systems, the Mac OS divvies up the task of managing IO devices. Sitting right above the hardware are chunks of code called drivers. Their job is to provide a software interface for the hardware. By abstracting the hardware through drivers, you don't have a bunch of software touching the hardware willy-nilly -- all access goes through one channel. If the hardware changes, only the driver needs to be rewritten.
In order to manage these drivers and the devices they control, Apple devised the Device Manager. Your application uses the Device Manager to handle IO -- you rarely talk to drivers directly.
To understand Mac OS IO is to understand the Device Manager. Fortunately, IO isn't a complex topic, and neither is the Device Manager. In fact, the entire Device Manager programming interface is just a few variations on seven basic commands: Open, Close, Read, Write, Control, Status and KillIO.
The Open command is used to open a connection to a driver. To be nice, make sure you call Close when you're done using the device.
The Read command is used to move data from the device into RAM. The Write command is used to move data from RAM to the device. These are the meat of the Mac OS IO programming interface.
Control is used when issuing command not directly related to pumping data. Changing a serial port's speed, for example.
Status is the flip side of Control, you can use it to get a serial port's speed.
KillIO has a special purpose that we'll get into momentarily.
To execute an IO action, you create an IO job. A job is simply a description you pass to the Device Manager of the IO task you'd like accomplished. Some information included in an IO job are the source of the data to transfer, the destination and the size of the transfer.
There are two models for executing IO jobs: synchronous and asynchronous. In a nutshell, the synchronous model is easy to code but locks up your Macintosh until the IO job completes. The asynchronous model is more difficult to code but doesn't lock up your Macintosh until the IO job completes.
Fortunately, when you combine an asynchronous model with the Thread Manager, you get a new model, threaded IO. Threaded IO combines the synchronous model's ease of use with the asynchronous model's parallelism. A worthy goal indeed.
The synchronous model for executing IO jobs is easy to code. Each of the Device Manager commands (Open, Close, Read, Write, Control, Status and KillIO ) are represented by one function call.
It could scarcely be easier to use the synchronous model -- the IO job is specified in the parameters of each function. Let's look at the function prototypes:
OSErr OpenDriver( ConstStr255Param name, short *drvrRefNum ); OSErr CloseDriver( short refNum ); OSErr FSRead( short refNum, long *count, void *buffPtr ); OSErr FSWrite( short refNum, long *count, const void *buffPtr ); OSErr Control( short refNum, short csCode, const void *csParamPtr ); OSErr Status( short refNum, short csCode, void *csParamPtr ); OSErr KillIO( short refNum );
Everything seems in order here. When you want to use a driver, you
call OpenDriver()
specifying the name the the driver in
question. If all goes well, you get a reference
number passed back in drvrRefNum
. A reference
number is a unique ID you use when referring to an open driver.
Notice every other function takes a variable named
refNum
.
Once you're done with the driver, call CloseDriver()
with the aforementioned reference number that
OpenDriver()
gave you.
You use FSRead()
to read. You pass it the omnipresent
reference number, the size of the job (count
) and where
in RAM to put the read data (buffPtr
).
FSWrite()
is just like FSRead()
except
buffPtr
now points where to get the data to write out
instead of where to put the data.
When you want to pass a Control message to a driver, you call
Control()
with constant in csCode
that maps
to the message you're passing. For example, the change serial speed
message constant is 13 (serdSetBaud
), so we'd set csCode
to 13 to change the serial port's speed.
The csParamPtr
argument for Control()
is
where you stick the information relevant to the Control message. In
the serial speed scenario, we'd set csParamPtr
to point
to a short
that tells that Serial Driver what speed to
set the port.
Status()
is Control()
's mirror twin. Use
it to get information from the driver. csCode
and
csParamPtr
work the same way as with
Control()
except the information is now outgoing instead
of incoming.
KillIO() deals with asynchronous IO -- we'll talk about it then.
To illustrate the various models (synchronous, asynchronous,
threaded), we'll code the same simple task to each model. The simple
task is to write the 4 byte string ATZ\r
to the modem
port. For those of you who don't know, ATZ\r
is the
modem reset command in the Hayes' AT command set. Assuming a modem is
attached to the modem port, the modem will reset itself.
Before we get into the code, let me note a Serial Driver quirk. Each serial port is controlled not by one but by two separate drivers: an input driver and an output driver.This separation is a work-around for a Device Manager constraint.
There's a few things to remember. One, the output driver is the dominant driver. Open it first, close it last, send all Write, Control and Status commands to it. Two, the Read command should only be directed to the input driver. Three, only the still-mysterious KillIO command can be directed to both the input and output drivers.
Here's a function that uses the synchronous model to execute our sample IO job:
OSErr SynchronousModemReset() { Str255 resetCmd = "\pATZ\r"; short inRefNum = 0, outRefNum = 0; long count = resetCmd[ 0 ]; OSErr err; /* Attempt to open the modem serial port */ err = OpenDriver( "\p.AOut", &outRefNum; ); if( !err ) err = OpenDriver( "\p.AIn", &inRefNum; ); /* Write the modem reset command using the synchronous model */ if( !err ) err = FSWrite( outRefNum, &count, resetCmd + 1 ); /* Call the test function */ if( !err ) Foo(); /* If we successfully opened the modem serial port, close it now */ if( inRefNum ) { (void) CloseDriver( inRefNum ); inRefNum = 0; } if( outRefNum ) { (void) CloseDriver( outRefNum ); outRefNum = 0; } return( err ); }
First we initialize five variables: resetCmd
,
inRefNum
, outRefNum
, count
and
err
. resetCmd
holds a Pascal string
containing the ATZ\r
command. inRefNum
and
outRefNum
will hold the input driver's reference number
and output driver's reference number, respectively. Until then, we
initially set them to zero. We do this to mark the reference number
as invalid. Bad things happen if we attempt to use an invalid
reference number. count
holds the size of the IO job.
Finally, err
holds the error code.
First we open the output driver and snatch its reference number. If that works, then we open the input driver.
If we were able to open both drivers then we write the modem reset
command string out the modem port. Foo()
will then be
called once FSWrite()
successfully returns.
We're all done here, now we make sure the input reference number
is valid before charging off to close the driver. We ignore the error
code returned by CloseDriver()
, because there's nothing
we could do about it if it failed.
Note we invalidate inRefNum
by setting it to zero
after we're done with it. This is a good precautionary measure to
take. We then close the driver with the outRefNum
reference number.
Synchronous IO has two drawbacks. Your computer is effectively frozen while the IO job completes. Interrupts are still handled, however anything depending on WaitNextEvent() is cut off. It's as if one process is hogging the processor. This is a bad thing.
The second drawback stems from the first: there's no way of
handling timeouts. Our modem reset command is a good example. What if
the modem isn't connected to the modem port when we synchronously
write ATZ\r
to it? We wait forever for the IO job to
complete &emdash; hanging the computer. Unfortunately the only way to
discover if something is plugged in is to blindly write to the port.
The way out of this is to write our modem reset command and wait maybe 7 seconds. If we didn't execute our IO job by then, it's an indicator that nothing is attached to the serial port.
The asynchronous model is a low-level programming interface -- you have to do extra work to use it, but it's more flexible.
Whereas you specify your IO job in the synchronous model's function parameters, you specify your IO job in parameter blocks when using the asynchronous model. A parameter block is simply a struct. When calling a function that uses a parameter block, you pass along a parameter block's address as the argument.
Specifying IO jobs in a parameters block is complex and error-prone, but you gain three advantages. First, when dealing with this many parameters, it's difficult to fit them all into a function's argument list. Second, it's easy to extend the structure to add your own fields. Third, and most important, by giving the parameter block its own chunk of memory, you can make it queueable.
Let me clear up that last statement. The Mac OS has a set of utilities named the Queue utilities. The Queue utilities are functions that maintain linked lists. A linked list with a little extra information tied to it is called a queue.
Each driver has a job queue associated with it. A job queue is a linked list of parameter blocks. When you execute an IO job asynchronously, the Device Manager places the parameter block at the end of that driver's job queue instead of executing it immediately (like the synchronous model does). The Device Manager immediately returns control to your software.
Using interrupts, the driver completes the IO jobs in its job queue. Seemingly in parallel your IO job completes and is retired.
Now is a good time to fill you in on KillIO. Since you are given back control immediately after executing an asynchronous IO job, you may find yourself wanting to stop an IO job thats pending or currently executing. That's what KillIO does -- it removes each pending IO job from the job queue and halts the current job. It's great for stopping run away IO jobs like our modem reset command string.
Here's the asynchronous programming interface's function prototypes:
OSErr PBOpenAsync( ParmBlkPtr paramBlock ); OSErr PBCloseAsync( ParmBlkPtr paramBlock ); OSErr PBReadAsync( ParmBlkPtr paramBlock ); OSErr PBWriteAsync( ParmBlkPtr paramBlock ); OSErr PBControlAsync( ParmBlkPtr paramBlock ); OSErr PBStatusAsync( ParmBlkPtr paramBlock ); OSErr PBKillIOAsync( ParmBlkPtr paramBlock );
The basic commands are all here: Open, Close, Read, Write,
Control, Status and KillIO. Drivers can't be opened, closed or killed
asynchronously, so that just leaves us with
PBReadAsync()
, PBWriteAsync()
,
PBControlAsync()
and PBStatusAsync()
.
All the functions take the same argument type:
ParmBlkPtr
. Inside Macintosh:Devices tells me
that ParmBlkPtr
is a pointer to a
ParamBlockRec
union:
union ParamBlockRec { IOParam ioParam; FileParam fileParam; VolumeParam volumeParam; CntrlParam cntrlParam; SlotDevParam slotDevParam; MultiDevParam multiDevParam; };
The various fields are used for different purposes depending on
what drivers you're working with. ioParam
is for
transport drivers like the Serial Driver. fileParam
is
used for the File Manager. volumeParam
is used for
managing storage volumes like floppies, hard drives, CDs, etc.
cntrlParam
is used for controlling drivers themselves.
We're most interested in the ioParam
field and thus the
IOParam
structure:
struct IOParam { QElemPtr qLink; short qType; short ioTrap; Ptr ioCmdAddr; IOCompletionUPP ioCompletion; OSErr ioResult; StringPtr ioNamePtr; short ioVRefNum; short ioRefNum; SInt8 ioVersNum; SInt8 ioPermssn; Ptr ioMisc; Ptr ioBuffer; long ioReqCount; long ioActCount; short ioPosMode; long ioPosOffset; };
The first two fields, qLink
and qType
,
are used by the Queue Manager. The next two fields,
ioTrap
and ioCmdAddr
, are used internally
by the Device Manager. I wouldn't mess with them.
ioCompletion
is an important field. When the IO job
is completed, the Device Manager calls the function pointer in
ioCompletion
if the field is not nil. The user-supplied
function to be called when the job is completed is called a
completion routine. Completion routines may be
executed at interrupt time and are subject to interrupt-time
restrictions. They can not use the Memory Manager, unlocked handles,
QuickDraw, etc.
When the parameter block is successfully placed into the job
queue, the ioResult
field is set to 1. When the IO job
is completed, ioResult
holds either 0 (noErr) or a
negative error code. You can use this knowledge to test if an IO job
is completed. If ioResult
is less than 1, the IO job is
finished.
We can safely ignore ioNamePtr
,
ioVRefNum
, ioVersNum
,
ioPermssn
and ioMisc
for now. Read
Inside Macintosh:Devices for these details.
ioRefNum
holds the much ballyhooed driver reference
number. ioBuffer
points to the place to put the data if
reading or the place to get the data if writing. You fill in
ioReqCount
with the transfer size you'd like --
ioActCount
tells you what you actually have. Finally,
you set ioPosMode
to the positioning mode (from the
start, from the end, from the mark, etc) and ioPosOffset
is where to find the data when reading or where to place the data
when writing.
Now we'll code the modem reset command using the asynchronous model.
OSErr AsynchronousModemReset() { Str255 resetCmd = "\pATZ\r"; short inRefNum = 0, outRefNum = 0; ParamBlockRec pb; OSErr err; /* Attempt to open the modem serial port */ err = OpenDriver( "\p.AOut", &outRefNum; ); if( !err ) err = OpenDriver( "\p.AIn", &inRefNum; ); /* Write the modem reset command using the asynchronous model */ if( !err ) { pb.ioParam.ioCompletion = nil; pb.ioParam.ioRefNum = outRefNum; pb.ioParam.ioBuffer = (Ptr) resetCmd + 1; pb.ioParam.ioReqCount = resetCmd[ 0 ]; pb.ioParam.ioPosMode = fsFromStart; pb.ioParam.ioPosOffset = 0; err = PBWriteAsync( &pb; ); } /* Call the test function */ if( !err ) Foo(); /* Wait until the asynchronous job completes */ if( !err ) while( pb.ioParam.ioResult > noErr ) {} /* If we successfully opened the modem serial port, close it now */ if( inRefNum ) { (void) CloseDriver( inRefNum ); inRefNum = 0; } if( outRefNum ) { (void) CloseDriver( outRefNum ); outRefNum = 0; } return( err ); }
The driver opening code and driver closing code is directly swiped
from SynchronousModemReset()
. We introduce the parameter
block here, pb
. We initialize a total of six fields in
the parameter block before calling PBWriteAsync()
.
Unlike with SynchronousModemReset()
,
Foo()
will now possibly be called
before the IO job is completed. If we wanted to make
sure the IO job is finished before calling Foo(), we could move it
after the while loop.
Speaking of which, the while loop takes advantage of the state of
ioResult
to determine if the IO job is done yet. It does
nothing while waiting, but you could easily slip some code in that
does some work.
While you're waiting for IO to complete, you'd like to get some other work done. Apple answered our desires to have a general task sharing mechanism by creating the Thread Manager.
Asynchronous IO and the Thread Manager sound like they go together like peanut butter and chocolate. Imagine you spawn a download thread. While your download thread waits for the slow modem, it gives time to other threads.
Ideally, you'd only need to add two lines of code to enable your asynchronous IO code take advantage of the Thread Manager.
OSErr IdealThreadedModemReset() { Str255 resetCmd = "\pATZ\r"; short inRefNum = 0, outRefNum = 0; ParamBlockRec pb; OSErr err; /* Attempt to open the modem serial port */ err = OpenDriver( "\p.AOut", &outRefNum; ); if( !err ) err = OpenDriver( "\p.AIn", &inRefNum; ); /* Write the modem reset command using the asynchronous model */ if( !err ) { pb.ioParam.ioCompletion = NewIOCompletionProc( WakeUpCompletionRoutine ); pb.ioParam.ioRefNum = outRefNum; pb.ioParam.ioBuffer = (Ptr) resetCmd + 1; pb.ioParam.ioReqCount = resetCmd[ 0 ]; pb.ioParam.ioPosMode = fsFromStart; pb.ioParam.ioPosOffset = 0; err = PBWriteAsync( &pb; ); } /* Sleep until WakeUpCompletionRoutine fires and wakes us up */ if( !err ) SetThreadState( kCurrentThreadID, kStoppedThreadState, kNoThreadID ); /* Call the test function */ if( !err ) Foo(); /* If we successfully opened the modem serial port, close it now */ if( inRefNum ) { (void) CloseDriver( inRefNum ); inRefNum = 0; } if( outRefNum ) { (void) CloseDriver( outRefNum ); outRefNum = 0; } return( err ); }
Wouldn't be great if after you execute the asynchronous
PBWriteAsync()
, you could stop the thread and depend on
the completion routine to reawaken the thread?
It would be nice -- but you can't.
Between when you call PBWrite() and you call SetThreadState(), the IO job can and will complete, executing our IO job.
Ideally, the execution path taken is like this:
PBWrite()
)
SetThreadState()
)
However, this path of execution is possible
PBWrite()
)
SetThreadState()
)
Your thread is stopped and will never be readied. Your thread is dead!
develop, Apple's Technical Journal, had an article on the Thread Manager. They advocated a dual thread solution.
There's two threads per IO job: the IO thread and the waker thread. Here's its execution path:
This is a poor work-around. You have to manage two threads per IO job and the scheduling overhead is too great.
PowerPlant, Metrowerk's C++ framework, defers the completion routine.
PowerPlant uses the ideal threaded IO model with a twist. Instead of the completion routine blindly attempting to ready the thread, it checks to see if the thread is really stopped. If it's not, then it sets a Time Manager task to execute 100 microseconds in the future. Hopefully by then the thread will be stopped.
This is a good work-around, however it complicates the completion routine.
With the polling coping mechanism, the thread is never stopped.
After executing the IO job, the thread simply polls
ioResult
until it's less than one, yielding all the
while.
Surprisingly, due to the scheduling overhead, this method is as fast as PowerPlant's and doesn't require a completion routine. This is the best work-around. However, it is still a work-around and polling is inelegant &emdash; we want a solution.
By now you realize that the Thread Manager wasn't designed with IO in mind. We should be able to use the ideal thread model.
The latency of the work-arounds is too high. Imagine your application has 25 threads running. The IO thread executes an IO job and yields. Even if the IO job completes immediately, the IO thread will have to wait behind the 24 other threads before it runs again. And one of those threads is your event loop, which may switch out your application.
Metaphysical question: what does it mean to stop a thread?
The Thread Manager thinks it means to mark a thread as ineligible for scheduling and schedule another thread.
My solution: Write a function that marks a thread as ineligible for scheduling but doesn't reschedule. This would put a thread into a known state before executing the IO job.
However, latency would still be high. When the IO job is completed we'd like our thread to be first in line. We'll also add the ability to mark a thread as "priority."
That's great! How do we do it?
The Thread Manager provides a hook where you can install your own
scheduler. However, the Thread Manager's data structures are
completely opaque -- there's no "thread queue" to access from our
scheduler. You can't even access a reference constant given a
ThreadID
!
Even if we did install a custom scheduler, we wouldn't know what to schedule!
However there is a way -- create and maintain your own thread
queue. The Thread Manager provides three hooks meant for debugging:
DebuggerNotifyNewThread()
,
DebuggerNotifyDisposeThread()
and
DebuggerNotifyScheduler()
. We'll plug into these hooks
to maintain three thread queues: an ineligible queue, an eligible
queue and a priority queue.
When our DebuggerNotifyNewThread()
hook is called,
we'll add an element to the eligible queue with the new thread's ID.
When our DebuggerNotifyDisposeThread()
hook is
called, we'll search our queues to find the element with a matching
thread ID and remove it.
Finally, when our DebuggerNotifyScheduler()
hook is
called, we'll look at our priority queue. If there's a priority
thread waiting we'll move it to the eligible queue and schedule it.
Priority status should be fleeting -- otherwise it will hog the
processor. If there isn't a priority thread waiting, we'll just
schedule the next thread in the eligible queue.
I've defined the XThreadElem structure to hold individual thread elements:
struct XThreadElem { XThreadElemPtr next; XThreadQueuePtr queue; ThreadID threadID; };
next
points to the next element in the queue.
queue
points to this element's owner while
threadID
holds (surprise!) the element's thread ID.
We'll store all three queues (ineligible, eligible and priority) in one handle as an array of XThreadElem structures. We'll use the standard Mac OS Queue Utilities to manage them. We'll keep the handle locked because the Thread Queue routine will be called at interrupt time.
Now we need a queue header. A queue header stores important information like the first element in the queue and the last element:
struct XThreadQueue { short type; XThreadElemPtr head; XThreadElemPtr tail; XThreadElemPtr mark; };
The type
field is there for Queue Utilities
compatibility -- we don't use it. The mark
field points
to the next thread to schedule.
I can't reprint all the Thread Queue code here -- look at the included code if you're interested.
In all, I define three extended Thread Manager calls:
OSErr InitXThreads(); XThreadState GetXThreadState( ThreadID threadID ); OSErr SetXThreadState( ThreadID threadID, XThreadState state );
Call InitXThreads()
once before calling any of the
other extended Thread Manager calls. It allocates
XThreadElem
array and installs the Thread Manager
debugging callbacks.
GetXThreadState() works like the Thread Manager's GetThreadState() except returns one of three constants:
enum { kXThreadIneligible = 0, kXThreadEligible, kXThreadPriority };
SetXThreadState() works like the Thread Manager's SetThreadState() except it takes the extended Thread Manager constants and doesn't reschedule.
Now is when the rubber meets the road. We've extended the Thread Manager cleanly. Now we want to merge synchronous IO with the extended Thread Manager to give us easy-to-code high-performance IO.
Witness two new functions:
OSErr ThreadedRead( short refNum, void *buffer, long *size, long offset, long patience ); OSErr ThreadedWrite( short refNum, void *buffer, long *size, long offset, long patience );
ThreadedRead()
and ThreadedWrite()
are
descendants of FSRead()
and FSWrite()
.
They're more powerful, so follow along.
refNum
is the standard reference number,
buffer
points to where to get the data or put the data.
Set size
to the size of the IO job -- after the IO job
is done size
will be set to the actual number of bytes
transferred. You specify where you want to read from or write to in
offset
. Finally, specify how long you're willing to wait
in milliseconds in patience
. One thousand milliseconds
is equal to one second.
All the code is included with this paper, hunt around and enjoy. I'm storing this paper at my web site and will continue to update it and the code. You can find it at: <http://www.u-s-x.com/wolfie/rants/andthebandplayedon.html>.
Apple Computer. Inside Macintosh:Devices. Addison Wesley, Reading, Massachusetts. 1994.