Memory

Memory Types
Memory Access
Wait States
Shadow RAM
Base Memory
Upper Memory
Extended Memory
High Memory
Expanded Memory
Virtual Memory

Memory Types

Unlike the good old days of mainframe computers when there were only two types of memory to deal with, the PC uses five, again for historical reasons. The memory contains the instructions that tell the Central Processor what to do, as well as the data created by its activities. Since the computer works with the binary system, memory chips work by keeping electronic switches in one state or the other for however long they are required. Actually, they consist of a capacitor and a transistor; the capacitor stores a charge (data), which represents a 1, and the transistor acts as a switch that turns the charge on or off. Where these states can be changed at will, it is called Random Access Memory, or RAM. The term derives from when magnetic tapes were used for data storage, and the information could only be accessed sequentially. A ROM, on the other hand, is a memory chip with its electronic switches permanently on of off, so they can't be changed, hence Read Only Memory.

Static RAM (SRAM) is the fastest available, with a typical access time of 25 nanoseconds. Static RAM is expensive and can only store a quarter of the data that DRAM is able to in the same given area, although it does retain it for as long as the chip is powered. The transistors are connected so that only one is either in or out at any time; whichever one is in stands for a 1 bit. Synchronous SRAM allows a faster data stream to pass through it; which is needed when used for cacheing on 90 and 100 MHz Pentium.
Dynamic RAM (DRAM) uses internal capacitors to store data (a single transistor turns it on of off) which lose their charge over time, so they need constant refreshing to retain information, otherwise 1s will turn to 0s. The end result is that between every memory access is sent an electrical charge that refreshes the chip's capacitors to keep data in a fit state, which cannot be reached whilst recharging is going on. Reading a DRAM discharges its contents, so they have to be written back to immediately to keep the sane information.
Enhanced DRAM (EDRAM) replaces standard DRAM and the SRAM in the level 2 cache on the motherboard, typically combining 256 bytes of 15ns SRAM inside 35ns DRAM. Since the SRAM can take a whole 256 byte page of memory at once, it gives an effective 15ns access speed when you get a hit (35ns otherwise). The level 2 cache is replaced with an SIC chip to sort out chipset vs. memory requirements. System performance is increased by around 40%. EDRAM has a separate write path that accepts and completes requests without the rest of the chip.
WRAM (Windows RAM), created by Samsung, is dual ported, but costs about 20% less than VRAM and is 50% faster. It runs at 50 MHz and is optimized for acceleration and can transfer blocks and supports text and pattern fills. Mostly used for video cards.
Synchronous DRAM (SDRAM) takes memory access away from the CPU's control; internal registers in the chips accept the request, and let the CPU do something else while the data requested is assembled for the next time the CPU talks to the memory. As they work on their own clock, the rest of the system can be clocked faster. There is a version optimized for video cards.
EDO (Extended Data Output) is an advanced version of fast page mode (often called Hyper Page Mode), which can be up to 30% better and only cost 5% more. Single-cycle EDO will carry out a complete memory transaction in 1 clock cycle; otherwise, each sequential RAM access inside the same page takes 2 clock cycles instead of 3, once the page has been selected. As it replaces level 2 cache and doesn't need a separate controller, space on the motherboard is saved, which is good for notebooks. It also saves battery power. In short, EDO gives and increased bandwidth due to shortening of the page mode cycle, but it doesn't appear to be that much faster in practice.

Memory Access

The cycle time is the time it takes to read from and write to a memory cell, and it consists of two stages; precharge and access. Precharge is where the capacitor in the memory cell is able to recover from a previous access and stabilize. Access is where a data bit is actually moved between memory and the bus or the CPU. Total access time includes the finding of data, data flow and recharge, and parts of the access time can be eliminated or overlapped to improve performance. The combination of precharge and access equals cycle time, which is what you should use to calculate wait states from.

There are ways of making refreshes happen so that the CPU doesn't notice (i.e. Concurrent and Hidden), which is helped by the 486 being able to use its on-board cache and not needing to use memory so often anyway. In addition, you can affect the Row Access Strobe (RAS), or have Column Access Strobe (CAS) before RAS (see Advanced Chipset Setup).

The fastest DRAM commonly available is rated at 60ns. As these chips need alternate refresh cycles, under normal circumstances data will actually be obtained every 120ns, giving you and effective speed of around 8 MHz for the whole computer, regardless of the CPU speed, assuming no action is taken to compensate. Memory chips therefore need to be operating at something like 20ns to keep up, assuming that the CPU needs only one clock cycle for each one from the memory bus; one internal cycle for each external one. Intel processors mostly use two for one, so the 33 MHz CPU is actually ready to use memory every 60ns, but you need to allow a little more for overheads, such as data assembly and the like. One way of matching the capacities of components with different speeds includes the use of wait states.

Clock Speed (MHz)	Cycle Time (ns)
1	1000
5	200
8	125
12	83
16	63
20	50
25	40
33	30
40	25

Wait States

A wait state indicates how many ticks of the system clock the CPU has to wait for memory to catch up-it will generally be 0 or 1, but can be up to 3 if you're using slower memory chips. Ways of avoiding wait states include:

Page-mode memory. This will cut-down address cycles to retrieve information form one general area, based on the fact that the second access to a memory location on the same page takes around half the time as the first; addresses are normally in two halves, with high bits (for row) and low bits (for column) being multiplexed onto one set of address pins. The page address of data is noted, and if the next data is in the same area, a second address cycle is eliminated as a whole row of memory cells can be read in one go; that is, once a row access has been made, you can get to subsequent column addresses in that row in the time available (you should therefore increase row access time for best performance). Otherwise data is retrieved normally, which will take twice as long. Fast Page Mode is a quicker version of the same thing; the DRAMs concerned have a faster CAS access speed. Memory capable of running in page mode is different from normal bit-by-bit type, and the two don't mix. It's unlikely that low capacity SIMMs are so capable.
Interleaved memory, which divides memory into two or four portions that process data alternately; that is, the CPU sends information to one section while another goes through a refresh cycle; a typical installation will have odd addresses on one side and even on the other (you can have word or block interleave). If memory accesses are sequential, the precharge of one will overlap the access time of the other. To put interleaved memory to best use, fill every socket you've got (that is, eight 1 Mb SIMMs are better than two 4 Mb ones). The SIMM types must be the same. As an example, a machine in non-interleaved mode (say a 386SX/20) may need 60ns or faster DRAM for 0ws access, where 80ns chip could do if interleaving were enabled.
A processor RAM cache, which is a bridge between the CPU and slower main memory; it consists of anywhere between 32-512K of (fast) Static RAM chips and is designed to retain the most frequently accessed code and data from main memory. It can make 1 wait state RAM look like that with 0 wait states, without physical adjustments, assuming that the data the CPU wants is in the cache when required (known as a cache hit). To minimize the penalty of a cache miss, cache and memory access are often in parallel, with one being terminated when not required. On a 486, how much cache you need really depends on the amount of memory; Dell say that jumping from 128K to 256K only increases the hit rate by around 5% and Viglen say you only need more than 256K if you have more than 32 Mb RAM. A cache should be fast and capable of holding the contents of several different parts of main memory. Software plays a part as well, since cache operation is based on the assumption that programs access memory where they have done so already, or are likely to next, maybe through looping (where code is reused) or code is organized to be next to other relevant parts. A basic cache design will look up an address for the CPU and return the data inside one clock cycle, or 20ns at 50 MHz. Asynchronous SRAM will be used for this. As the round trip from the CPU to cache and back again takes up a certain amount of time, only the remainder is available to retrieve data, which gets smaller as the motherboard speed is increased. Synchronous SRAM uses a buffer to keep the whole routine inside one clock cycle, even though it may use two (or more) clock cycles the first time round. The address from the CPU is stored, and while the next is coming in to the buffer, the data for the first is retrieved, and the cycle continues. Pipeline SRAM uses more clock cycles, typically three, the first time round, and Burst SRAM will deliver 4 words (blocks of data) over for consecutive cycles if the request from the CPU is for the first; there will be no waiting for the CPU to request each one individually. Note the level 2 cache can be unreliable, so be prepared to disable it in the interests of reliability. For maximum efficiency, or minimum access time, a cache may be subdivided into smaller blocks that can be separately loaded, so the chances of a different part of memory being requested and the time needed to replace a wrong section are minimized. There are three mapping schemes that assist with this:
- Fully Associative, where the whole address is kept with each block of data in the cache (in tag RAM), needed because it is assumed there is no relationships between the blocks. This can be inefficient, as an address comparison needs to be made with every entry each time the CPU presents the address for its next instruction.
- Direct Mapped, where every block can only be in one place in the cache, so only one address comparison is needed to see if the data required is there. Although simple, the cache controller must go to main memory more frequently if program code needs to jump between locations with the sane index, which defeats the object somewhat, as alternate references to the same cache cell mean cache misses for other processes. The "index" comes form the lower order addresses presented by the CPU.
- Set Associative, a compromise between the above two. Here, an index can select several entries, so in a 2 Way Set Associative cache, 2 entries can have the same index, so two comparisons are needed to see if the data required is in the cache. Also, the tag field is correspondingly wider and needs larger SRAMs to store address information. As there are two locations for each index, the cache controller has to decide which one to update or overwrite, as the case may be. The most common methods used to make these decisions are Random Replacement, First In First Out (FIFO) and Least Recently Used (LRU). The latter is the most efficient. It the cache is large enough (e.g. 64K), performance from this over direct-mapping may not be much. A Write Thru Cache means that every write access is immediately passed on to memory; although it means that cache contents are always identical to main memory, it is slow, as the CPU then has to wait for DRAMs. Buffers can be used to provide a variation on this, where data is written into a temporary buffer so the CPU is released quickly before main memory is updated. A Write Back Cache, on the other hand, exists where changed data is temporarily stored in the cache and written to memory when the system is quiet, or when absolutely necessary. This will give better performance when main memory is slower than the cache, or when several writes are made in a very short space of time, but is more expensive. A "dirty bit" is used as a mental note that the cache and main memory contents are different, and that the cache contains the most up to date data. This bit will be checked if the cache needs to be written to, and main memory updated first if this bit is set. Some motherboards don't have the required SRAM for the dirty bit, but it's still faster than Write Thru.

Shadow RAM

ROMs are used by components that need their own instructions to work properly, such as video card of cacheing disk controller. ROMs are 8-bit devices, so only one byte is accessed at a time; also, they typically run between 150-400ns, so using them will be slow relative to 32-bit memory at 60-80ns, which is capable of making four accesses at once.

Shadow RAM is the process of copying the contents of a ROM directly into extended memory which is given the same address as the ROM, from where it will run much faster. The original ROM is then disabled, and the new location write protected. If your applications execute ROM routines often enough, enabling Shadow RAM will make a difference in performance of around 8%, assuming a program spends about 10% of its time using instructions from ROM, but theoretically as high as 300%. The drawback is that the RAM set aside for shadowing cannot be used for anything else, and you will lose a corresponding amount of extended memory, The remainder of Upper Memory, however, can usually be remapped to the end of extended memory and used there.

With some VGA cards, if video shadow is disabled, you might get DMA errors, because of timing when code is fetched from the VGA BIOS, when the CPU cannot accept DMA requests. Some programs don't make use of the video ROM, preferring to directly address the card's registers, so you may want to use extended memory for something else. If you machine hangs during the startup sequence for no apparent reason, check that you haven't shadowed an area of upper memory containing a ROM that doesn't like it-particularly one on a hard disk controller, or that you haven't got two in the same 128K segment.

Base Memory

The first 640K available, which traditionally contains DOS, device drivers, TSRs and any programs to be run, plus their data, so the less room DOS takes up, the more there is for the rest. Different versions of DOS were better or worse in this respect. In fact, under normal circumstances, you can expect the first 90K or so to consist of:

An Interrupt Vector Table, which is 1K in size, including the name and address of the program providing the interrupt service. Interrupt vectors point to routines in the BIOS or DOS that programs can use to perform low level hardware access. DOS uses io.sys and msdos.sys for the BIOS and DOS, respectively.
ROM BIOS tables, which are used by system ROMs to keep track of what's going on. This will include I/O addresses and possibly user-defined hard disk data.
DOS itself, plus any associated data files it needs to operate with (e.g. buffers, etc.).

DOS was written to run applications inside the bottom 640K block simply because the designers of the original IBM PC decided to. Memory at the time was expensive, and most CP/M machines only used 64K anyway (the PC with 128K was $10,000!). Other machines of the same era used more; the Sirius allowed 896K for programs. Contrary to popular belief, Windows 3.1 uses memory below 1Mb, for administration purposes; although it pools all memory above and below 1 Mb (and calls it the Global Heap), certain essential Windows 3.1 structures must live below 1 Mb, such as the Task DataBase (TDB) which is necessary for starting new tasks.

Every Windows 3.1 application needs 512 bytes of memory below 1 Mb to load, but some will take much more if they can, even all that's available, thus preventing others from loading, which is one source of "Out of Memory" messages. There are programs that will purposely fragment base memory so it can't be hogged by any one program.

Rather than starting at 0 and counting upwards, memory addressing on the PC uses a two-step segment:offset addressing scheme. The segment specifies a 16-byte paragraph of RAM; the offset identifies a specific byte within it. The CPU finds a particular byte in memory by using two registers. One contains the starting segment value and the other the offset. The maximum that can be stored in one is 65,535 (FFFF in hex). The CPU calculates a physical address by taking the contents of the segment register, shifting it one character to the left, and adding the two together (see High Memory).

Sometimes, you will see both values separated by a colon, as with FFFF:000F, meaning the sixteenth byte in memory segment FFFF; this can also be represented as the effective address 0FFFFFh. When referring only to 16-byte paragraph ranges, the offset value is often left out. The 1025KB of DOS memory is divided into 16 segments of 64KBeach. Conventional memory contains the ten segments from 0000h to 9FFFh (bytes 0 to 6555,167), and Upper memory contains the six segments ranging from A000h to FFFFh.

Upper Memory

The next 384K is reserved for private use by the computer, so that any expansion cards with their own memory or ROMs can operate safely there without interfering with programs in base memory, and vice versa. Typical examples include network interface cards or graphics adapters. There is no memory in it; the space is simply reserved. This is why the memory count on older machines with only 1 Mb was 640 + 384K of extended memory; the 384K was remapped above 1 Mb so it could be used. When upper memory blocks are needed, that memory is remapped back again, so you lose a bit of extended memory.

This area is split into regions, A-F, which in turn are split into areas numbered from 0000 to FFFF hexadecimally (64K each). With the right software, this area can be converted in Upper Memory for use by TSR (memory-resident programs) to make more room downstairs. The amount of upper memory available varies between computers, and depends on the amount of space taken up by the System BIOS and whether you have a separate VGA BIOS (on board video sometimes has its BIOS integrated in the system BIOS). It also depends on the number of add-in cards you have, e.g. disk controllers, that normally take up around 16K.

Some chipsets will always reserve this 384K area for shadowing, so it will not appear in the initial memory count on power-up, the system configuration screen, or when using MEM. Other chipsets have a Memory Relocation option which will re-address it above 1 Mb as extended memory. Occasionally, some ROM space is not needed once the machine has booted, and you might be able to use it. A good example is the first 32K of the System BIOS, at F000 in ISA machines. It's only used in the initial stages of booting up, that is, before DOS gets to set up device drivers, so this area is often useable.

Extended Memory

Memory above 1 Mb is known as extended memory, and is not normally useable under DOS, except to provide RAM disks or caches, because DOS runs in real mode, and it can't access extended memory in protected mode which OS/2 and Windows 95 do. Some programs are able to switch the CPU from one to the other by using the DOS Protected Mode Interface (DPMI). Although extended memory first appeared on the 286, and some software was written to take advantage of it, the 286 was used mostly as a fast XT, because DOS wasn't rewritten. It wasn't until the 386, with its memory paging capability, that extended memory came to be used properly.

High Memory

The first 64K (less 16 bytes) of extended memory, which is useable only by 286, 386 or 486 based computers that have more than 1Mb of memory. It is the result of having more than 20 address linees that can be exploited by DOS to use that portion of extended memory as if it were below 1Mb, leaving yet more available for programs in base memory. In other words, it is extended memory that can be accessed in real mode. It is activated with himem.sys.

HMA access is possible because of the segment:offset addressing scheme of the PC. Memory addresses on a PC are 20 bits long, and are calculated by shifting the contents of a 16-bit register 4 bits to the left, and adding it to a 16-bit offset. The 8088, with only 20 address lines, cannot handle the address carry bit, so the processor simply wraps around to address 0000:0000 after FFFF:000F; in other words, the upper 4 bits are discarded.

On a 286 or later, the is a 21st memory address line that was left open by accident and which can be operated by software, which gives you a dirty bit. If the system activates this bit while in 8088 (real) mode, the wraparound doesn't happen, and the high memory area becomes available. IBM has lead the A20 line through a switch in the keyboard controller to (de)activate this address line.

Expanded Memory

This is the most confusing one of all, because it sounds so much like expansion memory, which is what extended memory is sometimes called. Once the PC was on the market, it wasn't long before 640K wasn't enough, particularly for people using Lotus, the top-selling application of the time, who were creating large spreadsheets and not having enough memory to load them, especially when version 2 needed 60K more memory than the original. It wasn't entirely their fault; Lotus itself in its early days was very inefficient in its use of memory anyway.

Users got onto Lotus, Intel and Microsoft for a workaround, and they came up with LIM memory, also known as Expanded. It's a system of physical bank-switching, were several extra banks of memory can be allocated to a program, but only one will be in the address space of the CPU at any time, as that bank switched, or paged, as required. In other words, the program code stays in the physical cells, but the electronic address of those cells is changed, either by software or circuitry.

In effect, LIM (4.0) directly swaps the contents of any 16K block of expanded memory with a similar one inside upper memory; no swapping takes place, but the pages have their address changed to look as if it does. Once the page frame is mapped to a page on the card, the data of that page can be seen by the CPU. Points to note about LIM:

It's normally only available for data (no program code).
Programs need to be specially written to use it.

In theory, LIM 4 doesn't need a page frame, the programs you run may well expect to see one. In addition, there could be up to 64 pages, so you could bank switch up to a megabyte at a time, effectively doubling the address space of the CPU, and enabling program code to be run and multitasking. This was called large-frame EMS, but it still used only four pages in upper memory; the idea was to remove most of the memory on the motherboard. The memory card backfilled conventional memory and used the extra pages for banking.

On an 8086 or 286-based machine, expanded memory is usually provided by circuitry on an expansion card, but there are some software solutions. 386 (and 486) -based machines have memory management built in to the central processor, so all that's needed is the relevant software to emulate LIM (emm386.exe or similar).

Virtual Memory

"Virtual" in the computer industry is a word meaning that something is other than what it appears to be. Many people have difficulties to understand what virtual memory actually means. Virtual memory is memory that does not exist. Several contradictory definitions about virtual memory exist. For some, virtual memory is not disk space. Disk (swap) space is used for backing allocated (committed) memory (global data, heap and stack) when the OS ran out of system memory (RAM). Example: You create a program and define a stack of 64KB. Because your program really doesn't require 64KB (only 2KB), the OS will allocate only one page (4K) during run-time. The other 64KB - 4KB = 60KB is virtual memory; memory that does not exist, not in system memory, not on disk. Only when your program runs out of stack, another page will be allocated unless a total of 64KB is used already.

From another point of view, Virtual Memory isn't memory at all, but hard disk space made to look like it; the opposite of a RAM disk. Windows (and System 7) uses virtual memory for swap files, used when physical memory runs out (you need protected mode on a PC to do that). Like disk cacheing, VM was used on mainframes for some time before migrating to the PC; VMS, the OS used on DEC VAXes, actually stands for Virtual Memory System. There is a speed penalty, of course, as you have to access the hard disk to use it, but Virtual Memory is a good stopgap when you're running short.