![]() Acrobat file (64K) |
![]() ClarisWorks 4 file (49K) |
![]() not available yet |
Technote 1059 | JULY 1996 |
NOTE: The contents of this technote are now covered in Inside Macintosh: Networking: Open Transport. Although this technote is still useful as a summary of OT server performance issues, we recommend that developers refer to Inside Macintosh for details. |
As higher network datalink speeds become more commonplace, Mac OS server performance has come under closer scrutiny. Developers who write network server applications on the Mac continue to be concerned about performance issues. In many cases, server throughput and connection latency problems may stem from poor application design rather than any deficiencies in Open Transport or the Mac OS. This Technote is intended for Macintosh developers writing network server applications that use the Open Transport API, and discusses some techniques you can employ in your network server application design to achieve higher performance. CONTENTS
About Open Transport on Mac OSThe Mac OS Open Transport API is based on the industry standard X/Open Transport Interface (XTI) specification. Since XTI originated in the UNIX world, it was slightly altered to operate in an asynchronous environment such as the Mac OS. When using XTI under UNIX, it is acceptable for a task to issue a blocking I/O request, causing the process to sleep until the request is completed. But since the current Mac OS is based on cooperative rather than preemptive multitasking, blocking I/O is unacceptable. As a consequence, applications must invoke their networking and I/O functions asynchronously; otherwise, the blocking of I/O directly affects the user experience. Apple's Open Transport addresses this mismatch in software architecture by extending XTI, so that an application can install a notifier to receive completion events. The key to building an application that takes full advantage of Open Transport on the Macintosh, such as a high performance network server, is the proper use of use of these XTI extensions. Understanding these extensions is important to both developers who are writing new Macintosh network server applications and those who attempt to port existing code based on UNIX sockets. Getting the Most Out of Open Transport By Using NotifiersThe Mac OS implementation of XTI allows your application to specify a callback routine or notifier upon opening an endpoint. Since the notifier mechanism is the most immediate way for an application to discover what endpoint events are occurring, your code should attempt to respond to most actions directly in the notifier. To get the maximum performance out of Open Transport, you ought to take advantage of notifiers. The following sections explain how to work with notifiers. Don't put off to event time what you can handle in your notifierIn the Mac OS, you have three execution contexts:
In general, network server code should avoid deferring incoming packet processing to System Task time. The reason is that calling WaitNextEvent can result in an unpredictable packet processing latency. As a result, the round trip time from when your server receives a packet to when it responds depends on factors external to your application. This is a consequence of cooperative multitasking used by the Mac OS. What this means is that you have no control of how long it takes for another task to call WaitNextEvent. Open Transport 1.1.1 introduced the SyncIdleEvents feature, which was intended to facilitate Notifier/Thread Manager interaction. What the feature does is call your notifier at a time when it is safe to call YieldtoThread. Since YieldtoThread will eventually cause the Thread Manager to switch to a thread that calls WaitNextEvent, this presents the same unpredictable latency. For this reason, I would suggest not to use this strategy in a high performance server. A better strategy to processing incoming packets is to receive and initiate all I/O from the notifier. As a rule, if you can start an OTSnd or an asynchronous File Manager operation from a notifier, you should do it. This is the most important piece of advice you can follow if you want to extract the best performance from Open Transport. Network applications developers, especially those designing HTTP servers, should heed this recommendation. Packet-response time is a true measure of Web server performance. Specifically, in a Notifier you should be able to perform the following tasks:
Some Notifier Hints and GuidelinesBecause proper use of Open Transport notifiers is the key to improved server performance, here are some hints and guidelines to help you use notifiers more effectively:
Interrupt-Safe FunctionsOne of the major reasons that developers have shied away from processing
packets in a notifier is you can't call the Mac Toolbox functions that move
memory at interrupt time. A number of fast, interrupt-safe functions, however,
are available from Open Transport. These functions are the same for both Mac OS
7 and Many developers may overlook the functions available in the OT library. The Open Transport Client Developer Note and the <Opentransport.h> include file ought to provide you with the information you need. Some interrupt-safe functions available under Open Transport are explained in the next section. Memory Management OTAllocMem/OTFreeMem can be safely called from a notifier. Keep in mind that the memory pool used by OTAllocMem is allocated from the application memory pool, which, due to the Memory Manager's constraints, can only be filled at task time. Therefore, if you allocate memory from the Open Transport memory pool from an interrupt or deferred task, you should be prepared to handle a failure resulting from an temporarily depleted memory pool, which can only be replenished at the System Task time. List Management OTLIFO functions can be used to implement an interrupt-safe LIFO or FIFO list. Here's how:
Semaphores There are also a number of Atomic functions and macros, such as OTAtomicSet/Clear/TestBit, OTCompareAndSwap, OTAtomicAdd, OTClearLock/OTAcquireLock that can be used to administrate interrupt-safe semaphores. TimeStamp There are a number of TimeStamp functions that can be used to accumulate profiling information -- for example, OTGetTimeStamp. These functions are fast and safe to call at Primary Interrupt time. You can use this time stamp information later to identify bottlenecks or report on application performance. Open Transport Deferred Tasks: A Closer LookOpen Transport Deferred Tasks provides a way to simplify working with primary interrupts, such as IO completions, VBL tasks, or Time Manager tasks. Remember that a Deferred Task, also known as Secondary Interrupt, occurs on the way out of processing a primary interrupt, after the interrupt mask has been lowered to zero. Deferred Tasks are in effect a priority above SystemTask, but still can be interrupted by a Primary Interrupt such as packet reception. OTCreateDeferredTask can be used to setup a block of code that can be scheduled from primary interrupts to run at next Deferred Task time. Just pass OTCreateDeferredTask a pointer to the function you wish to schedule and an contextInfo argument, and it returns you a reference that can be used later to schedule the function. dtRef = OTCreateDeferredTask(taskproc, contextInfo); You can then use OTScheduleDeferredTask to schedule the function associated with the reference to run at the next Deferred Task time. Once scheduled, the function pointed to by taskproc will be called back at the appropriate time and passed contextInfo as a parameter. if( OTScheduleDeferredTask(dtRef) ) ... ; The OTScheduleDeferredTask will return true if the function was scheduled, false if not. If the function was not scheduled, and the dtCookie parameter is valid, then this indicates that the function is already scheduled to run.
Avoiding Synchronization ProblemsIf you mix the processing of OTRcv in different interrupt contexts, such as notifier and Deferred Task or SystemTask time, you should be aware that certain synchronization problems can occur. For example:
Using Open Transport with the File ManagerSince the design of many network server applications requires interaction with the File Manager, it's important to understand how to get the most out of the File Manager.Not Blaming the File Manager for Poor Application DesignImproper use of the File Manager will adversely affect network server performance. If the architecture of your server requires even moderate amounts of file access, you should review Technote FL16 - File Manager Performance and Caching. This Note details tactics you can apply in order to get the best performance from the File Manager. Pay close attention to the following issues:
Caching Open Files, Not Just DataAn excellent way to improve performance is to avoid opening frequently used files every time they are accessed. You can accomplish this by maintaining the most recent or commonly used files in an open state, tracked by a list or cache, and only closing files after the list is full or an extended period of inactivity has elapsed. For example, a webserver would benefit by keeping the site's main index.html file open because it is hit every time a new user accesses the server. There are some tradeoffs, however, to keeping a large number of files open:
An Example of Processing a PacketSo far, I've provided you with a collection of hints and guidelines. Now I want to show you how you might use notifiers effectively to handle packet reception and processing. We're going to set up a example scenario of processing a packet in the context of an HTTP server. Let's make the following assumptions:
A HTTP GET-method packet is sent to your port.
Streamlining Endpoint CreationThe time required to create and open an endpoint can directly impact connection set-up time. This is of prime concern for servers, especially HTTP servers, since they must manage high connection turnover rates. Here are three tips that will speed up endpoint creation. Preallocating EndpointsOne cost of the transport independence provided by Open Transport is that the process of setting up STREAMS plumbing for each endpoint can be time -consuming. Rather than incur this delay each time a connection is established, a server designed to handle multiple outstanding connections can preallocate a pool of open, unbound endpoints into an endpoint cache. When a connection is requested, you can quickly dequeue a ready-to-use endpoint from this cache, resulting in a decreased connection turnaround delay (e.g., 10 to 30 times faster). For example:
Recycling EndpointsYou can use the endpoint-cache to recycle endpoints when your connection is closed. Rather than call OTCloseProvider each time a connection terminates, cache the unbound endpoint. This recycles it for a later open request.
Optionally, to save memory, you can deallocate the endpoint when the endpoint-cache reaches some predetermined limit. Cloning ConfigurationsAnother tactic to consider is to create a single OTConfiguration with OTCreateConfiguration, then use OTCloneConfiguration to pass the OTConfiguration to the OTOpenEndpoint. You'll find that OTCloneConfiguration is approximately 5 times faster than OTCreateConfiguration. Don't forget to free up the initial OTConfiguration before you quit your application. Managing the Connection QueueA high-performance server must be able to handle multiple connections simultaneously -- and efficiently. How you manage the connection queue is a key component of this. The connection queue determines the number of outstanding connect indications or calls for a server's listening endpoints -- i.e., connections that have neither been accepted via OTAccept nor rejected via OTSndDisconnect. Under the XTI framework, managing the connection queue can be complicated. The following section discusses the problems that you might encounter -- as well as the solutions. Handling kOTLookErrOne consequence of setting up a connection-oriented, listening endpoint to handle multiple outstanding calls is being able to handle the (dreaded) Look Error, or kOTLookErr. The kOTLookErr occurs when another concurrent event has arrived on the same endpoint, and cannot be acted on until the blocking event is consumed. The Problem To help illustrate why kOTLookErr occurs, you may want to think of the listening endpoint stream head as an event FIFO. When you bind the listening endpoint, you specify the queue length of this FIFO. If you specify a queue length of greater than one, then multiple T_LISTEN or T_DISCONNECT events will queue up. Certain rules apply when processing this queue. The rules which regulate your interaction with this queue are codified by the X/Open XTI specification. The following are the relevant rules you need to know in order to write your connection management code (in no particular order):
An Example Properly handling these connect requests requires you to be able to simultaneously process the number of calls that you specified in the queue length. For example:
This can continue until you have filled the queue with 5 inbound connection requests, at which point Open Transport will automatically refuse any further connection requests on that endpoint. The Solution You can apply the following strategy to handle processing inbound connection requests. We're going to set up a example scenario of accepting connections from our listening endpoint. Let's make the following assumptions:
Here's the scenario: The first thing you want to do is create a list (LIST) or array of call (CALL) instances large enough to handle your queue.
Alternatively, you could make step #3 a loop and handle all of the CALLs in the LIST simultaneously, but that makes handling the T_ACCEPTCOMPLETE / T_PASSCON events and LIST processing more complicated. In general, the added complexity may not be worth it. If you don't handle everything inside your notifier, there's one gotcha: if you set a flag in your notifier and process the event back in your main thread, you must deal with the following type of synchronization issue: There is a T_LISTEN on top of the queue hiding a T_DISCONNECT. Your notifier gets a T_LISTEN and you set a flag to handle it back in the main thread. The main thread does an OTListen, which clears the T_LISTEN and brings the T_DISCONNECT up to the top. Before the OTListen completes, your notifier will be interrupted with the T_DISCONNECT notification. Once you understand how things work, handling kOTLookErr is not as perplexing as it might seem. Negotiate qlen on BindAs mentioned at the beginning of this section, you specify the length of a listening endpoint's connection queue when you bind a connection-oriented service, such as TCP or ADSP. During the bind process, however, it is possible for the value of queue length to be negotiated by the endpoint. This length may be changed if the endpoint cannot support the requested number of outstanding connection indications. Therefore, it is important for your application not to assume that the value of qlen returned by the OTBind is the same as requested. You should always check it. Increasing ThroughputThis section discusses methods for increasing network data throughput to your server to enhance performance.No-Copy ReceiveWhen properly used, no-copy receives may provide a significant performance enhancement by letting you directly access the network interface's actual DMA buffers. This process lets you determine where and how to copy data; typically, this results in decreasing memory copies by a factor of 2. To request that Open Transport initiate a no-copy receive, you must pass the constant kOTNetbufDataIsOTBufferStar in the length parameter of the OTRcv. This will return you a pointer to the OTBuffer structure, as illustrated here: OTBuffer* myBuffer; OTResult result = OTRcv(myEndpoint, &myBuffer, kOTNetbufDataIsOTBufferStar);The OTBuffer data structure is based on the STREAMS mblk_t data structure, as illustrated below. By tracing the chain of fNext pointers, all of the data associated with the message can be accessed. struct OTBuffer { void* fLink; // b_next & b_prev void* fLink2; OTBuffer* fNext; // b_cont UInt8* fData; // b_rptr size_t fLen; // b_wptr void* fSave; // b_datap UInt8 fBand; // b_band UInt8 fType; // b_pad1 UInt8 fPad1; UInt8 fFlags; // b_flag }; A few utilities are available to the Open Transport programmer to simplify access to the OTBuffer structure. You can use OTReadBuffer to read data from the buffer, to OTBufferDataSize calculate its total length and OTReleaseBuffer to dispose of the OTBuffer when you are done. When processing a no-copy receive, it is very important that you minimize the time that you hold onto the buffer and be sure to call OTReleaseBuffer to return it to Open Transport. Otherwise, you run the risk of starving the network driver for DMA buffers and adversely affecting the performance of the operating system.
A Code Snippet Illustrating a No-Copy ReceiveThis snippet demonstrates how you might process a no-copy receive in your notifier:#include <openTransport.h> #include <OpenTptClient.h> // QUEUE_NET_EVENT(T_DATA) defers a particular event to later // MyProcessBuffer(buffer,actCount) consumes network data void HandleDataAvail(EndpointRef ep) { OTFlags flags; OTResult status; OTBuffer* buffP; OTBufferInfo buffInfo; size_t actCount; char* buffer; // Do while data left in OT while((status = ::OTRcv(ep, &buffP, kOTNetbufDataIsOTBufferStar, &flags)) > 0) { // Get count of bytes in buffer OTInitBufferInfo(&buffInfo, buffP); actCount = status; // Reserve buffer space // You should handle OTAllocMem failures. Maybe use a deferred task. ThrowIfNil(buffer = OTAllocMem(actCount)); // Read data into buffer ::OTReadBuffer(&buffInfo, buffer, &actCount); // Call code to consume network data MyProcessBuffer(buffer,actCount); // Return OTBuffer to system ::OTReleaseBuffer(buffP); } // Did we read all the data ? if(status == kOTNoDataErr) return; // Was the endpoint not ready yet? else if(status == kOTStateChangeErr) QUEUE_NET_EVENT(T_DATA); // Check for Rcv Error else ThrowIfErr( status ); }; Using AckSendBy enabling the AckSend option on endpoint, you can eliminate the need for Open Transport to perform a memory copy of contiguous data to be transmitted by OTSnd. Instead, a pointer to the data will be passed downstream. Once the memory is no longer being used, the notifier receives a T_MEMORYRELEASED event.
#include <stropts.h> error = OTIoctl(ep, IFLUSH, (void*) FLUSHRW); if (error) OTUnBind(ep)
Special thanks to Don Coolidge, Jim Luther, Peter N. Lewis, Paul Lodridge, Tom Maremaa, Eric Okholm, and Mike Quinn. Technotes Previous Technote | Contents | Next Technote |