home *** CD-ROM | disk | FTP | other *** search
- NetWare 386 Speed Rating
-
- Glenn Westin
- Consultant
- Systems Engineering Division
-
- Abstract: This AppNote provides an explanation of the NetWare 386 speed
- rating test and the system factors involved in attaining a high rating.
- These factors include the CPU chip type, the clock speed, and the main
- and memory cache architecture.
-
- Disclaimer
-
- Novell, Inc. makes no representations or warranties with respect to the
- contents or use of these Application Notes, or any of the third party
- products discussed in the AppNotes. Novell reserves the right to revise
- these Application Notes and to make changes in their contents at any
- time, without obligation to notify any person or entity of such revisions
- or changes. These AppNotes do not constitute an endorsement of the third
- party product or products that were tested. The configuration or
- configurations tested or described may or may not be the only available
- solution. Any test is not a determination of product quality or
- correctness, nor does it ensure compliance with any federal, state, or
- local requirements. Novell does not warrant products except as stated in
- applicable Novell product warranties or license agreements.
-
- Copyright { 1990 by Novell, Inc., Provo, Utah. All rights reserved.
-
- As a means of promoting NetWare Application Notes, Novell grants you
- without charge the right to reproduce, distribute and use copies of the
- AppNotes provided you do not receive any payment, commercial benefit or
- other consideration for the reproduction or distribution, or change any
- copyright notices appearing on or in the document.
-
- Introduction
-
- NetWare 386 is a high performance network operating system written
- specifically for the 80386 microprocessor. The operating system takes
- advantage of the 80386's 32-bit architecture, advanced instruction set
- and memory management features. NetWare 386 will only run on an 80386 or
- 80486 CPU; unlike its less advanced predecessor, NetWare 2.1x, which will
- run on an 80286 microprocessor as well as a 386 processor.
-
- Speed Rating
-
- While initializing, the OS performs a system speed test. The speed rating
- produced by this test serves two purposes. First the test informs the
- system administrator of the file server's current operating speed. (Some
- systems possess an AUTO CPU mode or have selectable CPU speeds that start
- in low speed. In low speed, some computers run as slow as 8 or even 6
- MHz.) Second, the test provides a way to rank file server performance
- with respect to CPU types, clock speeds, memory and cache.
-
- On completion of the test a rating is displayed at the server console. A
- higher rating indicates a faster system. For example, an 80386SX CPU
- running at 16 MHz should get a rating of about 95, while an 80386 CPU
- running at 16 MHz should get a rating of about 120 (see Table I).
- Ratings over 600 can be obtained with a properly configured 80486 server.
-
- Table I: Microprocessor Chip Ratings
-
- What Makes it Tick
-
- When reduced to an elementary level, the file server speed test is a
- simple loop that runs for approximately 0.16 seconds. The main function
- of the test is to count the number of iterations that can be completed
- within less than 2/100ths of a second. A larger number of iterations
- indicates a faster machine. Because computer instructions are timed in
- nanoseconds (there are one billion nanoseconds to a second) any
- interference with the loop's operation can greatly alter the final
- results.
-
- Before the speed test can begin the floppy drive is reset and then shut
- down. This is done because with some computers when the floppy drive is
- accessed, the system will automatically switch to slow speed, which
- allows copy protection schemes to be properly read. Unlike DOS, NetWare
- 386 does not reset the speed through the use of a real mode timer.
- Therefore, the system would be permanently set in the slow mode which
- would have a profoundly negative affect. Once the floppy is shut down,
- the speed test loop can begin. At the end of each iteration, the speed
- counter is incremented and the system time is checked.
-
- The process of checking the time and incrementing the speed counter
- requires several instructions and moves from CPU registers to memory.
- After three ticks (approximately 0.16 seconds) the loop is exited. The
- results are then divided by 1000, stored in memory (for later retrieval
- by typing "speed" at the server console) and displayed at the file server
- console. The result is divided by 1000 because the number in the counter
- is usually in the five to six digit range, which tends to be quite
- unwieldy. The use of these CPU instructions, registers and memory
- locations allow the routine to test various computers while maintaining a
- consistent testing component throughout. (See Fig. 1.)
-
- Chip Type
-
- Through the use of various CPU instructions, the system's chip type,
- clock cycles, memory wait states and cache are exercised. The chip type
- is important because of its inherent capabilities for data movement and
- the speed in which it can process instructions. The issue of data
- movement involves comparison of the SX to the 386 and 486 chips. The SX
- is a 32-bit chip, but it performs its data movements to and from memory
- in 16-bit chunks, while the 80386 and 80486 chips move their data 32 bits
- at a time. Because the SX chip must talk to its memory 16 bits at a time,
- it works harder and takes longer to perform the same tasks as the other
- 80386 and 486 chips.
-
- A main factor is the chip's ability to quickly process instructions,
- which involves the chip's architecture. As micro technology has improved,
- manufacturers have been able to fit more transistors, and consequently
- more functionality, onto microprocessor chips. This is evident in the
- chip evolution listed in Table II.
-
- Table II: Chip Evolution
-
- : Speed test flow chart
-
- The Intel 80486 combines four formerly separate components into one chip
- : the 80386 architecture, an 8K data cache, a cache controller, and an
- 80387DX-compatible math coprocessor. Intel also added Burst Mode
- high#speed data transfer and a five#stage pipeline feature, which
- processes up to five program instructions at once.
-
- The internal data cache of the 486 is far superior to that of the 386's
- external cache. When coupled with burst#mode data transfer, 128 bits of
- data are transferred into the internal 8K cache from main memory (or an
- external cache) with each CPU request. As a result, the 80486 uses fewer
- clock cycles to move data than does the 80386.
-
- The 486 can receive four 32#bit data blocks in only five to six clock
- cycles. The most efficient 80386 chip transfers one 32#bit block every
- two cycles. The 486 can handle multiple instructions at different stages
- of completion, which further adds to its ability to complete operations
- at its maximum rate of one transfer per clock cycle. These features allow
- the 486 to leap far beyond the processing capabilities of its ancestors,
- a feat that can be benchmarked through the performance speed rating in
- NetWare 386.
-
- Clock Speed
-
- The clock speed of most personal computers is determined by very precise
- vibrations of a thin slice of quartz crystal. This crystal may be in a
- metal package by itself on the CPU board, or it may be combined with
- other circuits into an oscillator module. In either case, the crystal and
- oscillator frequency is twice the speed at which the microprocessor
- operates. The chip cuts the clock speed in half internally before using
- it. In other words, an 80386 that operates at 16 MHz requires a system
- clock that operates at 32 MHz.
-
- Clock speed is measured in MHz: millions of cycles (or pulses) per
- second. Therefore, a computer's clock counts time in nanoseconds or
- billionths of a second. The throughput of a computer (how much
- information it can actually process) is directly related to its clock
- speed. Hence, a higher clock speed, coupled with the superior
- architecture found in the 386 and 486 chips, increases the amount of
- instructions that may be performed in a given amount of time.
-
- Memory
-
- One of the primary functions of a microprocessor is the movement of data
- to and from memory. The speed at which the system's memory chips operate
- can affect the time that these transfers require. By manipulating data
- between memory locations and registers (as the speed rating test does),
- the system's memory architecture becomes a factor in determining system
- speed.
-
- Ideal Memory System
-
- The ideal memory system is one in which the rate that memory can supply
- information to the processor matches the rate the processor can execute
- code. If memory is slower than the processor, the system is said to be
- bus bound. If the processor is slower than memory, the system is
- processor bound. Making one (processor or memory) faster than the other
- will probably cost more, but will not improve performance .
-
- Clock Cycles
-
- If memory cannot respond fast enough to meet the processor's demand for
- data, the processor must wait one or more clock cycles. The processor is
- not necessarily inactive during these clock cycles, because it has
- separate bus and execution units. While the bus unit is retrieving
- information from memory, the execution unit can perform other operations,
- such as register manipulation.
-
- Each clock cycle that the processor has to wait is called a wait state.
- Memory fast enough to respond to the CPU in two clock cycles is said to
- operate at zero wait states. (A memory access normally requires a minimum
- of two clock cycles, one for the microprocessor to send out an
- instruction informing the memory system which bytes it wants to read, and
- a second cycle for it to provide the contents of the location.)
-
- Each additional wait state increases the memory access by one clock
- cycle. Therefore a three cycle memory access would be synonymous with one
- wait state (i.e., two cycles equal zero, three cycles equal one wait
- state, and so on). Some 80386#based computers may, at times, have to
- endure 2 or 3 wait states per memory cycle when accessing system board
- memory, and sometimes 16 or more wait states when they read from memory
- expansion boards. Wait states are necessary because most PC designs use
- Dynamic Random Access Memory (DRAM) chips, which are significantly slower
- and less expensive then faster static ram. For this reason, PC designers
- also implemented memory cache.
-
- Memory Cache
-
- There are two forms of cache used to improve system performance: memory
- cache and disk cache. Memory cache increases RAM performance, while disk
- cache improves disk efficiency.
-
- The NetWare speed rating test exercises the memory cache of the file
- server. (While disk caching can play an important role in a LAN
- environment, it deviates from the main purpose of this appnote, and will
- be discussed in detail in future AppNotes.)
-
- Memory cache decreases the time required to fetch the next instruction in
- the program code. A memory cache architecture combines fast static random
- access memory (SRAM) speed with cost effective DRAMs. It provides a small
- amount (usually 32K but this figure can be larger) of fast SRAM (the
- cache) that is logically located between the processor and main memory
- (which is usually simple DRAM).
-
- SRAM in the cache usually has an access time of 35 nsec to as low as 15
- nsec, compared to the 80 to 120 nsec access times of the DRAM used in
- main memory. This increased access speed allows swift CPUs to access data
- in the cache at zero wait states.
-
- Cache circuitry ensures that the portions of main memory that are most
- often used are copied into the cache, making the majority of the memory
- accesses to fast memory in the cache, and not to the slower main memory.
-
- Whenever the processor attempts to read a memory location in a system
- that uses a memory cache, the memory subsystem checks to see if the
- contents of that location are stored in cache. If so, the data is
- transferred from the cache at fast SRAM speed (referred to as a cache
- hit.) If the data is not in cache, the processor must wait until the
- data can be transferred from slower main memory. This is called a cache
- miss and, depending on the speed of the main memory DRAM and the speed of
- the CPU, may inject many wait states into the system. During the wait
- states, the contents of the location is also copied into cache, where it
- can be accessed more quickly the next time it is needed.
-
- A cache is made effective by the tendencies of most computer programs to
- access the same few memory locations over and over, and to access
- neighboring locations of those accessed recently. Once those few
- locations have been loaded into cache, most accesses are made from the
- cache, not from slower main memory. This increases system performance.
- NetWare 386, like other well#developed programs, is designed with these
- features, and benefits from this technology.
-
- While more cache RAM always helps, it can reach a point of diminishing
- returns. Tests prove that a 32K cache will achieve a 96 percent hit rate,
- and that doubling or tripling the cache size only improves the hit rate
- by one or two percentage points. In most cases the increased cost for the
- extra cache memory is not worth such a marginal performance increase.
-
- Conclusion
-
- There are many interrelated factors that go into making a high#speed file
- server. The CPU chip type, the clock speed, and the main and memory cache
- architecture are the most vital of these factors. The NetWare 386
- Processor Speed test exercises all these components to formulate its
- results. The information gleaned from such a test can be of use for
- selecting the appropriate system and for evaluating a systems daily
- operation. But this information is only one element in the overall
- performance or throughput of a file server. There are other major
- components that must be examined with the same scrutiny. For example the
- disk and communications channels should each be tested because, in either
- case, an improper selection can cause even the fastest speed#rated
- computers to function with less than optimum performance.
-