Hardware Performance



Although computers may have basic similarities (they all look the same on a shelf), performance will differ markedly between them, just the same as it does with cars. The PC contains several processes running at the same time, often at different speeds, so a fair amount of coordination is required to ensure that they don't work against each other.
Most performance problems arise from bottlenecks between components that are not necessarily the best for a particular job, but a result of compromise between price and performance. Usually, price wins out and you have to work around the problems this creates.
The trick to getting the most out of any machine is to make sure that each component is giving of its best, then eliminate potential bottlenecks between them. You can get a bottleneck simply by having an old piece of equipment that is not designed to work at modern high speed - a computer is only as fast as its slowest component, but bottlenecks can also be due to badly written software.

System Timing

The clock is responsible for the speed at which numbers are crunched and instructions executed. It results in an electrical signal that switches constantly between high and low voltage several millions times a second.
The System Clock, or CLKIN, is the frequency used by the processor; on "*?s and 386s, this will be half the speed of the main crystal on the motherboard (the CPU devides it by two), which is often called CLK2IN. 486 processors run at the same speed, because they use both edges of the timing signal. A clock generator chip (82284 or similar) is used to synchronize timing signals around the computer, and the data bus would be run at slower speed synchronously with the CPU, e.g. CLKIN/4 for an ISA bus with a 33 MHz CPU.
ATCLK is a separate clock for the bus, when it's run asynchronously, or not derived from CLK2IN. There is also a 14.138 MHz crystal which was used for all system timing on XTs. Now it's only used for the colour frequency of the video controller (6845).

The Motherboard

This is a large circuit board to which are fixed the Central Processor (CPU), the data bus, memory and various support chips, such as those that control speed and timing the keyboard, etc. The CPU does all the thinking, and is told what to do by instructions contained in memory, so there will be a direct two-way connection between them. The data bus is actually a part of the CPU, although it's treated separately.
Extra circuitry in the form of expansion cards is placed into expansion slots on the data bus, so the basic setup of the computer can be changed easily (for example, you can connect more disk drives or a modem there).
A math co-processor is frequently fitted alongside the main processor, which is specially built to cope with non-integer arithmetic (e.g. decimal points). The main processor has to convert decimals and fractions to whole numbers before calculating them, and then has to convert them back again.

The Central Processor

The chip that was the brains of the original IBM PC was called the 8088, manufactured by Intel. In the days Intel developed the 8088, microprocessors were classified by their external databus. Intel has thus officially classified their 8088 as an 8-bit HMOS microprocessor. Although it was internally classified as being 16-bit, it spoke to the data bus and memory with 8 bits in order to keep the costs down and keep in line with the capabilities of the support chips. Then when it wanted to send two characters to the screen over the data bus, it had to send them one at a time, rather than both together, so there was an idle state where nothing was done every time the data was sent.
In addition, it could only talk to 1 Mb of memory at any time; there were 20 physical connections between it and the CPU. On a binary system this represents 220, or 1,048,576.

The 80286

The 80286 was introduced in response to competition from manufacturers who were cloning the IMB PC. The connections between the various parts of the motherboard became 16-bit throughout, thus increasing efficiency 4 times. It also has 24 memory address lines, so it could talk to 16Mb of physical memory. However, DOS could not use it since it had to be addressed in protected mode. DOS can only run in real mode, which is restricted to the 1 Mb that can be seen by the 8088. Therefore, a Pentium running DOS is just a fast XT. Just as the 8088, the 80286 CPU is limitted to 1MB (+ 64KB) when running in real mode. The 80286 CPU has to run in protected mode to access extended memory. On a 80286 system, DOS' extended memory manager (himem.sys) uses BIOS service INT 15h/AH=87h to move data from/to extended memory. INT 15h enters and leaves protected mode.

The 80386

Compaq was the first company to use the 80386 (the DX version, as opposed to the SX-see below), which uses 32 bits between itself and memory, but 16 towards the data bus, which has not really been developed in tandem wit the rest of the machine. This is partly to ensure backwards compatibility and partly due to the plumbing arrangements of running a fast CPU with faster memory and a slow bus (8 MHz).
The 386 can run multiple copies of real mode (that is, it can create several virtual 8088s). It uses paging to remap memory so that these machines are brought to the attention of the CPU when the programs in them require it; this is done on a timeslice basis, around 60 times a second, which is how we get multitasking in Windows of OS/2 (in 95 the slice is every 20 ns).
The 386 can also switch out of protected mode on the fly, or at least in a more elegant way that the 286; in order to get the hard disk and other parts of the computer, protected-mode software hat to get DOS to perform real mode services, so the CPU has to switch in and out of protected mode continually. The goal is therefore to use real mode as little as possible and to run in protected mode. Windows does this by using 32-bit instructions. Because of 32-bit addressing, the 80386 and above CPU's are not limitted to 1MB when running in real mode. Because the address bus is also 32-bit, any address can be reached using 0000:<32-bit offset>. Also, DOS segments larger than 64KB can be handled as a whole instead using chunks of 64KB, or by using normalized (huge) pointers. On 80386+ systems, DOS' extended memory manager simply uses a 32-bit block move instruction to move data from/to extended memory.
The 386 uses pipelining to help streamline memory access-the idea is that they are done independently of each other (at the same time) while other units get on with their jobs; a form of primitive parallel processing. The 386 also has a pre-fetch unit for instructions, that tries to speed things up by guessing which ones the processor will use next.
Although the 386 is 32-bit and has certain benefits, like the ability to manipulate memory and switch in and out of protected mode more readily, replacing a 286 with a 386 does not automatically give you performance benefits it you are running 16-bit 80286 code (most DOS programs).

The 80386SX

The 80386Sx is a 32-bit chip internally, but 16-bit externally to both memory and the data bus, so you get bottlenecking. It is a cut-down version of the 80386DX, created to both cut costs and give the impression the 286 was obsolete (true), because at the time other manufacturers could make the 286 under license. Although it can run 836-specific software, it looks like a 286 to the machine it is in, so existing 286 motherboards could be used to plug in 386SX CPUs. At the same clock speed, a 386SX is around 25% slower than a 386DX.

The 80486

To non-technical people, the 80486 is a fast 80386DX with an on-board math co-processor and 8K of cache memory. It is not really a newer technology as such (only second generation), but better use is made of its facilities. For example, it takes fewer instruction cycles to do the same job, and is optimized to keep as many operations inside the chip as possible. The 386 pre-fetch unit was replaced by 8K of SRAM cache, and pipelining was replaced by burst mode, which works on the theory that most of the time spent getting data concerns its address. Burst allows a device to send large amounts of data in a short time without interruption. Pipelining on the 386 requires 2 clocks per transfer; only one is needed with 486 Burst Mode. Memory parity checks also take their own path at the same time as the data they relate to. The 486 has an on-board clock, and both edges of the square wave signal are used to calculate the clock signal, so the motherboard runs at the same speed as the CPU. In addition, the bus system uses a single pulse cycle. Generally speaking, at the same clock speed, a 486 will deliver between 2-3 times the performance of a 386.

The 80486SX

The 486SX is as above, but with the math co-processor facility disabled, therefore you should find no significant difference between it and a 386; a 386/40 is broadly equivalent to a 486/25.

Clock Doubling

The DX/" chip runs at double speed of the original, but it is not the same as having a proper high speed motherboard because the bus will still be running at the normal speed. Unfortunately, high speed motherboards are more expensive because of having to design out RF emissions, and the like.
Actual performance depends on how many accesses are satisfied from the chip's cache, which is how the CPU is kept busy, rather than waiting for the rest of the machine. If the CPU has to go outside the cache, effective speed is the same as the motherboard or, more properly, the relevant bus (memory or data), so best performance is obtained when all the CPU's needs are satisfied from inside itself. However, performance is still good if it has to use cache, as the hit rate is around 90%. The DX4 has a larger cache (16K) to cope with the higher speed.

The Pentium

Essentially two 486s in parallel (or rather an SX and a DX), so more instructions are processed at the same time; typically two at once. This, however, depends on whether software can take advantage of it, and get the timing of the binary code just right. It has separate 8K caches, for instructions and data, split into banks which can be accessed alternately. It has a 64-bit bus, to cope with 2 32-bit chips.

The Pentium Pro

This is a RISC chip with a 486 hardware emulator on it. Several techniques are used by this chip to produce more performance than its predecessors; speed is achieved by dividing processing into more stages, and more work is done within each clock cycle; three instructions can be decoded in each one, as opposed to two for the Pentium.
In additions, instruction decoding and execution are decoupled, which means that instructions can still be executed if one pipeline stops (such as when one instruction is waiting for data from memory; the Pentium would stop all processing at this point). Instructions are sometimes executed out of order, that is, not necessarily as written down in the program, but rather when information is available, although they won't be much out of sequence; just enough to make things run smoother.
It has a 8K cache for programs and data, but it will be a two chip set, with the processor and a 256K L2 cache in the same package. It is optimized for 32-bit code, so will run 16-bit code no faster than a Pentium.

Summing up

In principle, the faster the CPU the better, but only if your applications do a lot of logical operations and calculation (where the work is centered around the chip) rather than writing to disk. For example, when a typical word processing task, replacing a 16 MHz 386 with a 33 MHz one (doubling the speed) will only get you something like a 5-10% increase in practical performance, regardless of what the benchmarks might say. It is often a better idea to spend money on a faster hard disk.
Also, with only 8 Mb RAM in your computer, you won't see much performance increase from a DX2/66 until you get a Pentium 90 (none at all between a DX4/100 and a Pentium 75). With Windows, this is because the hard disk is used a lot for virtual memory (swap files), which means more activity over the data bus. Since motherboards below the 90 run at 33Mhz (only the chips run faster), the bottleneck is the disk I/O, running at much the same speed on them all. This is especially true if you use Programmed I/O (PIO), where the CPU must scrutinize every bit to and from the hard drive (although Multi-sector I/O or EIDE will improve things). As the Pentium 90's motherboard runs faster (66 MHz), the I/O can proceed at a much faster pace, and performance will more than double (a more sophisticated chipset helps).
With 16 Mb, on the other hand, performance will be almost double anyway, regardless of the processor, because the need to go to the hard disk is so much reduced, and the processor can make a contribution to performance. The biggest jump is from a DX2/66 to a DX/4, with the curve flattening out progressively up to the Pentium 90.
Processor MB Speed Bus Speed
P60 60 30
P66 66 33
P75 50 25
P90 60 30
P100 66 33
P120 60 30
P133 66 33