Windows NT has been designed from the ground up to be a highly responsive, general-purpose operating system. To the real-time developer, this implies that there are some areas where Windows NT will not be suitable for real-time applications as a result of basic design choices made in its architecture. Topics of interest in real-time systems include:
Responding To External Events
Real-time applications are designed to respond to external events within a specified time interval. Windows NT offers strong capabilities in the areas of both interrupt management and I/O management.
Interrupts
Real-time applications use interrupts as a way of ensuring that
external events are noticed by the operating system. It is critical that
interrupts be handled promptly, according to their relative priority.
Within Windows NT, the kernel and the Hardware Abstraction Layer (HAL) are tuned to optimize interrupt delivery and event dispatching. The kernel provides interrupt dispatching to the rest of the system. The kernel can operate at one of thirty-two possible interrupt levels as shown in the following table; these levels help to prioritize the tasks that must be accomplished before other, less time-critical work. The kernel reserves eight interrupt levels for its own use. The remaining twenty-four interrupt levels are mapped onto hardware interrupts using the HAL.
Interrupt Definition Level 31 Hardware error interrupt Level 30Powerfail interrupt Level 29 Inter-processor interrupt Level 28 Clock interrupt Levels 12-27 These levels map to the traditional interrupt levels 0-15 used in PCs Levels 4-11 These levels are not generally used Level 3 Software debugger interrupt Levels 0-2 Reserved for software-only interrupts to prioritize work within device drivers and executive componentsWindows NT handles interrupts on a preemptive basis; when an interrupt occurs, all execution at lower interrupt levels is suspended and execution begins immediately on the highest-level request. Processing continues until the highest-level process has been completed. This places a responsibility on device drivers in that system responsiveness is directly related to how quickly a device driver exits its interrupt routine.
Another way to state this is that Windows NT offers applications a multilevel interrupt mask. Higher priority interrupts can occur when the interrupt mask allows them to occur. Changing the interrupt mask raises the level so that lower level interrupts can not use system resources until the handling routing for the higher level interrupt has been completed.
Multiprocessor systems
Windows NT is designed for multi-processor systems. When an
interrupt is dispatched, the kernel dispatches the interrupt to just
one of the processors in the system. All other processors continue
executing uninterrupted. Interrupts can be handled on any of the
processors in a machine; this allows interrupts to be handled by idle
processors, rather than concentrating the load on a single processor.
Use of multiprocessor systems can offer significant benefits for
real-time applications.
Asynchronous I/O
Asynchronous I/O is a very powerful mechanism for user-level real-
time applications; the application can queue I/O and continue
processing without having to either wait or respond immediately to
some end-of-I/O event. Additionally, there are completion
mechanisms in the I/O system (completion port I/O) that efficiently
use the kernel synchronization and executive scheduling capabilities
to distribute I/O completion processing to the most recently busy
thread. This assures that cache is not invalidated and that the system
makes efficient use of the processing power available to it. This can
pay enormous dividends on multi-processor systems and have no
appreciable overhead on single-processor systems.
In many cases (such as a Win32 application), asynchronous I/O may not be important and the application will wait for the I/O to complete before returning. However, in the case where the user (or kernel component) wishes to do work while the asynchronous I/O is completing, they can specify that they do not wish to wait for the request to complete and can continue working in the rest of the application. When the asynchronous I/O eventually completes, an event or some other notification mechanism will fire. The application can check for this completion event at some future time when it is convenient to do so within the application.
Device Drivers
Device drivers are very important to real-time users of Windows NT.
In particular, processing in a device driver will proceed to
completion without any interruptions, which is something that many
real-time applications want. In order to get this kind of performance,
however, the device driver code must be extremely solid. Windows
NT device drivers run entirely within the system process and have
access to all hardware through the HAL. A typical device driver will
have several components as described in the following table.
Component Description Initialization Routine This routine initializes hardware and sets up data structures used by the driver at startup time Interrupt Service This routine handles an interrupt on Routine (ISR) the device that the device driver controls Deferred Processing Call One or more DPCs handle non-time- (DPC) critical processing for the driver System Thread Some, but not all, drivers will have a system thread, which is for very low priority workWhen a device driver starts, the initialization routine will typically make the driver known to the system, register some entry points, and register an ISR. The device driver will wait, consuming only memory resources, until an interrupt occurs that meets the criteria of the driver's ISR; the driver's ISR is then entered. The driver will not be interrupted until the end of its interrupt service routine unless a higher level-interrupt occurs. Unlike other operating systems, an ISR on Windows NT can be interrupted by another ISR with higher priority; this is one reason that interrupt latency is hard to define for Windows NT.
When a driver is in its interrupt service routine, it should perform the minimum processing necessary to handle the interrupt, save the state necessary for processing the interrupt, queue a DPC routine for later processing that is not time-critical, and return. The DPC will occur at some later time�although it may occur immediately after leaving the interrupt service routine if the system is not very busy. DPCs will run to the exclusion of all other processing (other than ISRs) until the DPC exits. Most device driver processing is done in this deferred processing routine or at even lower priority routines queued by this DPC. A number of important rules apply to DPCs. The most important rule is that a DPC cannot wait or lock up the system. Also important is that the DPC must have all memory it accesses locked down in physical memory so that it cannot incur page faults. It should be possible, using the support routines and driver model provided by Windows NT, to write device drivers that handle even the most complex and high speed data acquisition hardware.
Priorities And Scheduling
Real-time applications, by definition, have a time component associated with their behavior. In this context, it is important to understand how Windows NT assigns priorities to applications and schedules their execution. This section also discusses several other elements of the operating system and how their use can affect real-time applications.
Process priority
Within Windows NT, user applications are defined as processes.
Windows NT is a pre-emptive, multi-tasking operating system that
allows multiple processes (i.e., applications) to run within the system
at the same time. A process has a number of properties that are
associated with it. For real-time applications, one of the most
important properties is the priority class (such as real_time) that
defines the basic priority at which the application will run. The
priority model within Windows NT includes 32 priority levels of
which 16 are reserved for the operating system and real-time
processes. Note that priority levels are different from the dispatch
interrupt levels discussed in the kernel section. User applications
almost always run at interrupt level 0, regardless of the priority
level they are set to.
Each process maintains a private address space to ensure that it will not interfere with other processes. Each process has a base priority class. As shown at left, real-time applications can run with a base priority class of 31 (highest priority), 24, and 16. Typically, real-time processes will run at priority 24. Other applications (dynamic classes) have base priority class of 15, 13, 9 (normal foreground process), 7, 4, 1, and 0.
Each process also has associated with it, within the same address space, one or more threads where each thread represents an independent portion of that process. The number of threads is limited only by available memory and resources. The properties associated with the process, including the priority level, are inherited by these threads.
Each thread has a current priority that is derived from the process' priority class; it may vary upward and downward within defined limits using an API call that can vary up or down from the process' base priority. For example, a process running at real_time class 24 can have threads that run anywhere between classes 26-22 depending on their own independent priority. These threads will always stay within the real_time priority class.
Threads are independently scheduled by the executive. A process has associated with it a quantum, which is the maximum amount of time one of these threads can execute before the system checks to see if other threads with the same priority in the system want to execute. In general, real-time processes will have priority over almost all other activities or system events. However, for processes in the spectrum of dynamic classes that are running at lower priority levels, a number of events within the system, such as I/O completion, can cause a temporary priority boost for a thread, giving it priority within a process.
Finally, there is a single system process, within which there can be multiple system threads running. This system process runs all device drivers, the kernel, the executive, and device drivers. All of these components share a single address space, called "system space". A device driver, executive component, or the kernel can create a new system thread at any time�these threads can be used to do work in the context of the system process. This technique of running a thread within the context of the system, where it has direct access through the HAL to device hardware might be of interest to real-time engineers.
Memory management
Memory management is another area in which many real-time
engineers are interested. Windows NT is built around a virtual
memory system. For real-time applications, Windows NT solves
many of the problems that face real-time developers using more
traditional virtual memory systems. First, paging I/O occurs at a
lower priority level than the real-time priority process levels. Paging
within the real-time process is still free to occur but this really
ensures that background virtual memory management won't interfere with processing at real-time priorities.
Second, Windows NT permits an application to lock itself into memory so that paging within its own process does not affect it. This allows even very large processes (such as raster image processing where some processes are over 100MB in size) to lock all of their memory down into physical memory and avoid the overhead of paging, while allowing the rest of the system to function normally.
Finally, Windows NT memory management allows memory mapping which permits multiple processes, even device drivers and user applications, to share the same physical memory. This results in very fast data transfers between cooperating processes or between a driver and an application. Memory mapping can be used to dramatically enhance real-time performance.
Cache management
Cache management is one of the drawbacks of using a general
purpose operating system such as Windows NT for real-time
applications. Memory caching is a technique that uses a small
amount of high-speed memory to hold the most recently used code
or data. If the next instruction or piece of data is not in the cache, the
CPU retrieves it from the slower main memory. Using a cache results
in the best average system performance for an operating
system, but it does introduce an element of timing unpredictability
in real-time environments.
Synchronization Requirements
One of the most difficult tasks of real-time systems is ensuring that different threads and processes stay synchronized. That is, within a real-time application, the timing at which different activities occur is important. For example, if one part of the application completes before a second part gets the most current data, then the process that the application is monitoring may become unstable. Synchronization results from ensuring that application components are prioritized properly.
Kernel Synchronization
Most of the work in the kernel is performed at the highest software
interrupt level (known as dispatch_level) or above. The kernel's job
consists primarily of synchronization of execution on multiple
processors, dispatching, and system database maintenance; it does
very little work that is not a direct consequence of a request by a
user or subsystem.
The kernel also has a rich set of dispatch objects; these objects synchronize execution within device drivers and Windows NT executive components. Included in this set of dispatch objects are various timers, events, mutexes and semaphores. These objects can all be used in a number of ways to synchronize execution as necessary within the Windows NT executive and kernel. These objects are also used by subsystems to implement the synchronization primitives exported to user applications.
Timers
With general purpose operating systems that use virtual memory
and caching algorithms, it is often difficult to ensure that events can
take place within specified periods of time.
Windows NT offers several timers that can be used to obtain more deterministic time intervals for managing events in real-time environments. These timers generate software interrupts from the kernel. With Windows NT Workstation 3.5, applications can use the basic system timer with the GetTickCount() API. The resolution of this timer is 10 milliseconds. Several CPUs support a high-resolution counter that can be used to get very granular resolution. The Win32 API called QueryPerformanceCounter() returns the resolution of a high-resolution performance counter. For Intel®-based CPUs, the resolution is about 0.8 microseconds. For MIPS-based CPUs, the resolution is about twice the clock speed of the processor. You need to call QueryPerformanceFrequency() to get the frequency of the high-resolution performance counter.
Spinlocks
Another method that ensures proper synchronization is a spinlock. A
spinlock is a locking mechanism associated with a global data
structure that ensures that only one thread can get access to that
data at any one time. Once the first thread is done, it releases the
spinlock so that other threads can then get access to that data. Within
Windows NT, spinlocks are often used by device drivers in order to
ensure that device registers or other data structures can be
accessed by only one device driver at a time. Real-time applications
can use spinlocks to synchronize timing events during an interrupt
response or other similar activity.
Deterministic Response Times
With real-time systems, it is important to understand how quickly the operating system can respond to external events. The more deterministic the operating system can be, the more suitable the system will be for real-time applications.
Latency
To process an interrupt, three steps are generally taken. First, is the
hardware interrupt latency. This represents the time that it takes for
the CPU to finish processing the current instruction, flush the
instruction pipeline, read the interrupt vector, locate the address of
the Windows NT trap handler, and jump to that address.
Second, the trap handler records the current machine state and
creates a trap frame that records the execution state of the thread
that was interrupted including program counters, registers, and
other information. At this point, the trap handler starts an interrupt
dispatcher which determines the source of the interrupt and then
transfers control to an external routine, called an Interrupt Service
Routine (ISR), or to an internal kernel routine. The ISR is provided by
the device driver for the particular device that caused the interrupt.
Finally, at this point, the ISR starts an I/O transfer to or from the
device and executes other threads while the device completes the
transfer. When the transfer is complete, the device again
interrupts the CPU for service. Frequently, in real-time
environments, latency refers to the total time that it takes for these
steps to occur�that is, the amount of time that it takes for the CPU to
acknowledge and handle an interrupt.
Sample measurements
In a recent paper delivered at the 1995 Digital
Communications Design Conference, the ability for Windows NT to
handle real-time activities was measured. These measurements were
designed to understand the appropriateness of using Windows NT as
a platform for a TCP/IP router.
Measurement Duration Hardware Interrupt Latency 1.8 - 2.9 microseconds Interrupt Dispatching 4.6 - 10.5 microseconds Interrupt Service Routine Length 10.3 - 16.7 microseconds Total Elapsed Time 16.7 - 30.1 microsecondsThe paper concluded that Windows NT was appropriate for use as a real-time system. Basic measurements 3 reported in the paper are listed in the table at left. The primary discrepancy in the overall duration of the event was attributed to effects of virtual memory and, in particular, the cache manager.
2 Brian Catlin, Design of a TCP/IP Router Using Windows NT. Mr. Catlin is a principal at Catlin & Associates in Redondo Beach, CA. The firm's primary business is systems analysis and programming.
3 The system being measured was a Hewlett-Packard XU 5/90 personal computer with one 90 MHz Pentium CPU, 256kb synchronous cache, 16 MB memory and 540 MB of disk space. Measurement test equipment included various Hewlett-Packard systems.
Previous Page Home Next Page