Hardware Performance
- Although computers may have basic similarities (they all
look the same on a shelf), performance will differ
markedly between them, just the same as it does with
cars. The PC contains several processes running at the
same time, often at different speeds, so a fair amount of
coordination is required to ensure that they don't work
against each other.
- Most performance problems arise from bottlenecks
between components that are not necessarily the best for
a particular job, but a result of compromise between
price and performance. Usually, price wins out and you
have to work around the problems this creates.
- The trick to getting the most out of any machine is to
make sure that each component is giving of its best, then
eliminate potential bottlenecks between them. You can get
a bottleneck simply by having an old piece of equipment
that is not designed to work at modern high speed - a
computer is only as fast as its slowest component,
but bottlenecks can also be due to badly written
software.
- The clock is responsible for the speed at which numbers
are crunched and instructions executed. It results in an
electrical signal that switches constantly between high
and low voltage several millions times a second.
- The System Clock, or CLKIN, is the
frequency used by the processor; on "*?s and 386s,
this will be half the speed of the main crystal on the
motherboard (the CPU devides it by two), which is often
called CLK2IN. 486 processors run at the same speed,
because they use both edges of the timing signal. A clock
generator chip (82284 or similar) is used to synchronize
timing signals around the computer, and the data bus
would be run at slower speed synchronously with the CPU,
e.g. CLKIN/4 for an ISA bus with a 33 MHz CPU.
- ATCLK is a separate clock for the bus, when it's run
asynchronously, or not derived from CLK2IN. There is also
a 14.138 MHz crystal which was used for all system timing
on XTs. Now it's only used for the colour frequency of
the video controller (6845).
- This is a large circuit board to which are fixed the Central
Processor (CPU), the data bus, memory and
various support chips, such as those that control speed
and timing the keyboard, etc. The CPU does all the
thinking, and is told what to do by instructions
contained in memory, so there will be a direct two-way
connection between them. The data bus is actually a part
of the CPU, although it's treated separately.
- Extra circuitry in the form of expansion cards
is placed into expansion slots on the
data bus, so the basic setup of the computer can be
changed easily (for example, you can connect more disk
drives or a modem there).
- A math co-processor is frequently fitted alongside the
main processor, which is specially built to cope with
non-integer arithmetic (e.g. decimal points). The main
processor has to convert decimals and fractions to whole
numbers before calculating them, and then has to convert
them back again.
- The chip that was the brains of the original IBM PC was
called the 8088, manufactured by Intel. In the days Intel
developed the 8088, microprocessors were classified by
their external databus. Intel has thus officially
classified their 8088 as an 8-bit HMOS microprocessor.
Although it was internally classified as being 16-bit, it
spoke to the data bus and memory with 8 bits in order to
keep the costs down and keep in line with the
capabilities of the support chips. Then when it wanted to
send two characters to the screen over the data bus, it
had to send them one at a time, rather than both
together, so there was an idle state where nothing was
done every time the data was sent.
- In addition, it could only talk to 1 Mb of memory at any
time; there were 20 physical connections between it and
the CPU. On a binary system this represents 220,
or 1,048,576.
- The 80286 was introduced in response to competition from
manufacturers who were cloning the IMB PC. The
connections between the various parts of the motherboard
became 16-bit throughout, thus increasing efficiency 4
times. It also has 24 memory address lines, so it could
talk to 16Mb of physical memory. However, DOS could not
use it since it had to be addressed in protected mode.
DOS can only run in real mode, which is restricted to the
1 Mb that can be seen by the 8088. Therefore, a Pentium
running DOS is just a fast XT. Just as the 8088, the
80286 CPU is limitted to 1MB (+ 64KB) when running in
real mode. The 80286 CPU has to run in protected mode to
access extended memory. On a 80286 system, DOS' extended
memory manager (himem.sys) uses BIOS service INT
15h/AH=87h to move data from/to extended memory. INT 15h
enters and leaves protected mode.
- Compaq was the first company to use the 80386 (the DX
version, as opposed to the SX-see below), which uses 32
bits between itself and memory, but 16 towards the data
bus, which has not really been developed in tandem wit
the rest of the machine. This is partly to ensure
backwards compatibility and partly due to the plumbing
arrangements of running a fast CPU with faster memory and
a slow bus (8 MHz).
- The 386 can run multiple copies of real mode (that is, it
can create several virtual 8088s). It uses paging to
remap memory so that these machines are brought to the
attention of the CPU when the programs in them require
it; this is done on a timeslice basis, around 60 times a
second, which is how we get multitasking in Windows of
OS/2 (in 95 the slice is every 20 ns).
- The 386 can also switch out of protected mode on the fly,
or at least in a more elegant way that the 286; in order
to get the hard disk and other parts of the computer,
protected-mode software hat to get DOS to perform real
mode services, so the CPU has to switch in and out of
protected mode continually. The goal is therefore to use
real mode as little as possible and to run in protected
mode. Windows does this by using 32-bit instructions.
Because of 32-bit addressing, the 80386 and above CPU's
are not limitted to 1MB when running in real mode.
Because the address bus is also 32-bit, any address can
be reached using 0000:<32-bit offset>. Also, DOS
segments larger than 64KB can be handled as a whole
instead using chunks of 64KB, or by using normalized
(huge) pointers. On 80386+ systems, DOS' extended memory
manager simply uses a 32-bit block move instruction to
move data from/to extended memory.
- The 386 uses pipelining to help streamline memory
access-the idea is that they are done independently of
each other (at the same time) while other units get on
with their jobs; a form of primitive parallel processing.
The 386 also has a pre-fetch unit for instructions, that
tries to speed things up by guessing which ones the
processor will use next.
- Although the 386 is 32-bit and has certain benefits, like
the ability to manipulate memory and switch in and out of
protected mode more readily, replacing a 286 with a 386
does not automatically give you performance benefits it
you are running 16-bit 80286 code (most DOS programs).
- The 80386Sx is a 32-bit chip internally, but 16-bit
externally to both memory and the data bus, so you get
bottlenecking. It is a cut-down version of the 80386DX,
created to both cut costs and give the impression the 286
was obsolete (true), because at the time other
manufacturers could make the 286 under license. Although
it can run 836-specific software, it looks like a 286 to
the machine it is in, so existing 286 motherboards could
be used to plug in 386SX CPUs. At the same clock speed, a
386SX is around 25% slower than a 386DX.
- To non-technical people, the 80486 is a fast 80386DX with
an on-board math co-processor and 8K of cache memory. It
is not really a newer technology as such (only second
generation), but better use is made of its facilities.
For example, it takes fewer instruction cycles to do the
same job, and is optimized to keep as many operations
inside the chip as possible. The 386 pre-fetch unit was
replaced by 8K of SRAM cache, and pipelining was replaced
by burst mode, which works on the theory that most of the
time spent getting data concerns its address. Burst
allows a device to send large amounts of data in a short
time without interruption. Pipelining on the 386 requires
2 clocks per transfer; only one is needed with 486 Burst
Mode. Memory parity checks also take their own path at
the same time as the data they relate to. The 486 has an
on-board clock, and both edges of the square wave signal
are used to calculate the clock signal, so the
motherboard runs at the same speed as the CPU. In
addition, the bus system uses a single pulse cycle.
Generally speaking, at the same clock speed, a 486 will
deliver between 2-3 times the performance of a 386.
- The 486SX is as above, but with the math co-processor
facility disabled, therefore you should find no
significant difference between it and a 386; a 386/40 is
broadly equivalent to a 486/25.
- The DX/" chip runs at double speed of the original,
but it is not the same as having a proper high speed
motherboard because the bus will still be running at the
normal speed. Unfortunately, high speed motherboards are
more expensive because of having to design out RF
emissions, and the like.
- Actual performance depends on how many accesses are
satisfied from the chip's cache, which is how the CPU is
kept busy, rather than waiting for the rest of the
machine. If the CPU has to go outside the cache,
effective speed is the same as the motherboard or, more
properly, the relevant bus (memory or data), so best
performance is obtained when all the CPU's needs are
satisfied from inside itself. However, performance is
still good if it has to use cache, as the hit rate is
around 90%. The DX4 has a larger cache (16K) to cope with
the higher speed.
- Essentially two 486s in parallel (or rather an SX and a
DX), so more instructions are processed at the same time;
typically two at once. This, however, depends on whether
software can take advantage of it, and get the timing of
the binary code just right. It has separate 8K caches,
for instructions and data, split into banks which can be
accessed alternately. It has a 64-bit bus, to cope with 2
32-bit chips.
- This is a RISC chip with a 486 hardware emulator on it.
Several techniques are used by this chip to produce more
performance than its predecessors; speed is achieved by
dividing processing into more stages, and more work is
done within each clock cycle; three instructions can be
decoded in each one, as opposed to two for the Pentium.
- In additions, instruction decoding and execution are
decoupled, which means that instructions can still be
executed if one pipeline stops (such as when one
instruction is waiting for data from memory; the Pentium
would stop all processing at this point). Instructions
are sometimes executed out of order, that is, not
necessarily as written down in the program, but rather
when information is available, although they won't be
much out of sequence; just enough to make things run
smoother.
- It has a 8K cache for programs and data, but it will be a
two chip set, with the processor and a 256K L2 cache in
the same package. It is optimized for 32-bit code, so
will run 16-bit code no faster than a Pentium.
Summing up
- In principle, the faster the CPU the better, but only if
your applications do a lot of logical operations and
calculation (where the work is centered around the chip)
rather than writing to disk. For example, when a typical
word processing task, replacing a 16 MHz 386 with a 33
MHz one (doubling the speed) will only get you something
like a 5-10% increase in practical performance,
regardless of what the benchmarks might say. It
is often a better idea to spend money on a faster hard
disk.
- Also, with only 8 Mb RAM in your computer, you won't see
much performance increase from a DX2/66 until you get a
Pentium 90 (none at all between a DX4/100 and a Pentium
75). With Windows, this is because the hard disk is used
a lot for virtual memory (swap files), which means more
activity over the data bus. Since motherboards below the
90 run at 33Mhz (only the chips run faster), the
bottleneck is the disk I/O, running at much the same
speed on them all. This is especially true if you use
Programmed I/O (PIO), where the CPU must scrutinize every
bit to and from the hard drive (although Multi-sector I/O
or EIDE will improve things). As the Pentium 90's
motherboard runs faster (66 MHz), the I/O can proceed at
a much faster pace, and performance will more than double
(a more sophisticated chipset helps).
- With 16 Mb, on the other hand, performance will be almost
double anyway, regardless of the processor, because the
need to go to the hard disk is so much reduced, and the
processor can make a contribution to performance. The
biggest jump is from a DX2/66 to a DX/4, with the curve
flattening out progressively up to the Pentium 90.
Processor |
MB Speed |
Bus Speed |
P60 |
60 |
30 |
P66 |
66 |
33 |
P75 |
50 |
25 |
P90 |
60 |
30 |
P100 |
66 |
33 |
P120 |
60 |
30 |
P133 |
66 |
33 |