home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!usc!sol.ctr.columbia.edu!ira.uka.de!uka!uka!news
- From: S_JUFFA@iravcl.ira.uka.de (|S| Norbert Juffa)
- Newsgroups: comp.sys.intel
- Subject: Cyrix 486SLC/DLC compatibility issues and NextStep....
- Date: 17 Dec 1992 14:38:25 GMT
- Organization: University of Karlsruhe (FRG) - Informatik Rechnerabt.
- Lines: 550
- Distribution: world
- Message-ID: <1gq3d1INNg8g@iraul1.ira.uka.de>
- NNTP-Posting-Host: irav1.ira.uka.de
- X-News-Reader: VMS NEWS 1.23
-
- Compatibility issues: Cyrix Cx486SLC/DLC compared to the Intel 80486SX
-
-
- There has been quite a bit of discussion here recently about compatibility
- issues involving the Cyrix Cx486SLC and Cx486DLC processors, in particular
- about the fact that the NextStep operating system doesn't run on the Cyrix
- processors for some reason. During the course of this discussion, we have
- heard *a lot of opinions* (e.g. "Intel sucks", "Cyrix sucks") but only *few
- facts*. So I thought it might a good idea to throw in a bit of the latter.
- I'll try to give the facts as accurate as possible, drawing from personal
- experience and Intel's and Cyrix' literature on the 80486DX/SX and 486DLC/SLC.
- If you think you have found erroneous information, feel free to contact me:
-
- S_JUFFA@IRAVCL.IRA.UKA.DE (Norbert Juffa)
-
-
- NOTE: I have no affiliation whatsoever with either Intel or Cyrix!
-
-
- The Cyrix 486DLC is a replacement chip for the Intel/AMD 80386DX. The Cyrix
- 486SLC is a replacement chip for the Intel/AMD 386SX. While the internals of
- the Cyrix 486SLC/DLC are roughly equivalent to those in the Intel 80486SX,
- the bus interface of these chips is identical to that of the Intel 80386DX
- and 80386SX CPUs, respectively to allow easy replacement of the Intel CPUs
- by the Cyrix chips. This also means that the Cx486SLC, as a replacement for
- the Intel/AMD 80386SX can only address 16 MB of memory.
-
- The 486SLC/DLC CPUs have a register set that is identical to that found on
- the Intel 80486SX. However, there are a few subtle differences in the
- meaning of certain bits in some systems registers (e.g. cache test registers).
- These are covered in more details below. The instruction sets of the Intel
- 486SX and the Cyrix Cx486SCL/DLC are identical. The execution times of specific
- instructions differ between the two chips, but the overall execution speed
- (measured in CPI = clocks per instruction) seems to be about same.
-
- On both, the Intel 80486SX and the Cyrix 486SLC/DLC, there is *no* on-chip
- FPU (floating point unit). To add floating point capabilities to a 486SX
- based system, one would install an 487 'coprocessor', which is basically a
- 486DX with a slighty different pin-out, or replace the 486SX with an OverDrive
- processor, a clock-doubled 486DX with the 486SX pinout. With the 486SLC/DLC,
- one buys a 387 compatible coprocessor to add floating-point capabilities. It
- is recommended to get a Cyrix coprocessor for this purpose, since these are
- the fastest387 compatible coprocessors available. Also, Cyrix sells kits
- consisting of a 486SLC/DLC and a coprocessor that have a favourable value
- for money ratio. The floating-point performance of a Cyrix 486DLC + Cyrix
- 83D87 combination is about 50% of that of an Intel 486DX running at the same
- frequency.
-
- The Cyrix 486SLC/DLC have a RISC-like execution unit with a flexible five
- stage pipeline, just as the 80486SX has. Unlike the Intel 80486, which has
- an 8 kB, 4-way associative cache on chip, the Cx486SLC/DLC only have an 1
- kB, 2-way associative cache (the cache on the Cyrix chips can also be
- configured to be of the direct mapped type). The 486DLC provides up to 80%
- more integer performance than a 386DX at the same clock frequency, with the
- average performance gain being about 35%. With the 1 kB on-chip cache enabled,
- the 486DLC provides about 75% of the integer performance of a 486SX at the
- same clock frequency. With the cache disabled, the 486DLC provides about 65%
- of the integer performance of a 486SX. The lower performance of the Cyrix
- 486DLC as compared to the Intel 80486SX is mostly due to the slow 386DX bus
- interface the 486DLC uses, which is up to 2 times slower than the 486 bus
- interface. Some additional performance penalty is imposed by the smaller
- cache on the 486SCL/DLC, which provides significantly lower hit rates than
- the 8 kB cache of the 80486SX.
-
-
- I have personally used the Cyrix 486DLC with my 33.3/40 MHz 386 motherboard,
- which uses the Forex chip set. I have also used the Intel RapidCAD and the
- C&T 38600DX with this board. These are also replacement chips for the 386DX.
- Replacing the 386DX is very easy: Just pull out the AMD/Intel 386DX, then
- plug in the replacement chip (here: the Cyrix 486DLC). I haven't had *major*
- problems with either of the available replacement chips. The problems
- encountered using the Cyrix 486DLC were:
-
- 1) When a Cyrix EMC87, Cyrix 83D87 (chips produced prior to November 1991),
- or IIT 3C87 coprocessor is used with the 486DLC, the computer locks up
- completely at times, especially when running protected mode multitasked
- operating systems, such as Windows 3.1 in enhanced mode. This is caused
- by problems with the FSAVE and FRSTOR instructions when using these
- coprocessors. Cyrix tells me that this problem only occurs with first
- generation 486DLCs (read: sample chips like the one I have) and that the
- bug is fixed on the chips that are now available to OEMs and end users.
- 2) When using the DBOS 1.0 DOS-extender delivered with the Salford FTN/386
- Fortran compiler, the executable of the DODUC benchmark produced by that
- compiler aborts with a general protection fault. The DODUC executable
- runs fine with the DBOS 1.0 DOS-extender on the Intel 386DX, C&T 38600DX,
- Intel RapidCAD, and Intel 80486DX. I have informed Cyrix of the problem.
-
- As for the problems with NextStep on the 486DLC, I have no idea what causes
- them. I can think of the following possibilities:
-
- 1) NextStep has been tailored extremely close to the 486 programming model,
- not allowing for even slight changes in the architecture (e.g. smaller
- cache), so that the subtle changes needed to adapt the different HW of
- the Cyrix 486SLC/DLC to the 486 programming model can not be accomodated.
- 2) NextStep includes code that only runs because it uses officialy undocumented
- features of the 80486 that have not been disclosed by Intel to other vendors.
- 3) NextStep includes code that only runs correctly on the 80486 by accident.
- E.g. it could mask the contents of an system register and erroneously
- include a bit that is undefined as per Intel's documentation. This undefined
- bit could then be '1' on the 80486 and '0' on the 486SLC/DLC, for example,
- thus leading to corruption of the system further down the execution path.
- 4) For correct execution, NextStep relies on the timing of certain instructions
- that execute slower or faster on the Cx486SLC/DLC than they do on the Intel
- 80486SX (a chip that reportedly runs NextStep).
- 5) NEXT Corporation used an early and possibly buggy sample chip to do their
- compatibility testing.
- 6) There is a bug in the Cyrix 486SLC/DLC that only creeps up if protected
- mode system level programs are used, similar to the problem I encountered
- with the DBOS 1.0 DOS-extender that is described above. However, it is
- interesting to note that several 32-bit operating systems have been
- successfully tested on the 486SLC/DLC (see below).
-
-
-
-
- Summary of Intel 486SX / Cyrix 486SLC/DLC implementation details
-
-
- Intel 486SX
-
- bus interface: supports burst mode memory accesses with the first
- memory access taking two clock cycles and subsequent
- accesses taking only one clock cycle.
- prefetch queue: 32 bytes
- on-chip cache: 8 KByte unified (code and data) write-through cache.
- The cache is 4-way set-associative, with 128 sets
- consisting of four cache lines each. Every cache line
- consists of 16 bytes. Four write buffers. Hit rate: ~95%
- Invalidation of cache lines: total cache line
- execution unit: RISC-like execution unit with five stage pipeline. Barrel
- shifter. Conditional jump taken/not taken: 3/1 clock cycles.
- Instructions that can be executed in 1 clock cycle if the
- destination is a register and the source is either a register
- or an immediate value:
- ADC,ADD,AND,BSWAP,CMP,DEC,INC,MOV,NEG,NOT,OR,POP,PUSH,SBB,
- SUB,TEST,XOR
-
- Cyrix 486DLC
-
- bus interface: Cx486SLC/DLC uses same the same bus interface as the
- Intel 386DX/386SX. Highest speed at which memory is
- accessed is two clock cycles per memory access, there
- is *no* burst mode. Seven additional signals have been
- assigned to pins that are not connected on the 386DX/
- 386SX. After power-on or reset, these pins are also
- electrically disabled on the Cx486SLC/DLC and must be
- specifically enabled by software. Signals added are used
- for cache management (KEN#, FLUSH#, RPLSET and RPLVAL#),
- power management (SUSP#, SUSPA#), and A20 control (A20M#).
- Each signal can be enabled/disabled independently of the
- enable/disable status of the other signals.
- instruction set: complete Intel 486SX instruction set, including *all*
- 486 specific instructions: WBINVD (write back and
- invalidate data cache), XADD (exchange and add), CMPXCHG
- (compare and exchange), BSWAP (Byte Swap), INVLPG
- (Invalidate TLB entry), INVD (Invalidate Data Cache)
- prefetch queue: 16 bytes
- on-chip cache: 1 KByte unified (code and data) write-through cache.
- The cache is 2-way set-associative, with 128 sets
- consisting of two cache lines each. Every cache line
- consists of 4 bytes. Two write buffers. Hit rate: ~65%
- Invalidation of cache lines: single bytes in cache line
- The cache is disabled after power-on or reset for
- compatibility reasons and must be enabled by software.
- Under DOS, you can use a program provided by Cyrix for
- this purpose. As far as I know, there are no programs
- available yet for OS/2 and Unix that enable the cache.
- execution unit: RISC-like execution unit with five stage pipeline. Barrel
- shifter. 16x16 bit hardware multiplier (16x16 bit multiply:
- 3 cycles, 32x32 bit multiply: 7 cycles, AAD: 4 cycles).
- Conditional jump taken/not taken: 6/1 clock cycles.
- Instructions that can be executed in 1 clock cycle
- if the destination is a register and the source is
- either a register or an immediate value:
- ADC,ADD,AND,CDQ,CLC,CLD,CMC,CMP,CWD,DEC,INC,MOV,MOVSX,
- NEG, NOT,OR,SBB,SHLD,SHRD,STC,STD,SUB,TEST,XOR
-
-
-
-
- Summary of known compatiblity issues
-
- The following is an extract from the Cx486SLC and Cx486DLC Compatibility
- Report, Cyrix Corporation 1992, Order No. 94074-00, with some additional
- information added by me that has been taken from the Cyrix Cx486SLC
- Microprocessor Data Sheet, Cyrix Corporation 1991, Order No. 94073-00,
- the i486 Microprocessor Hardware Reference Manual, Intel Corporation,
- Order No. 240552-001, and the i486 Microprocessor Programmer's Reference
- Manual, Order No. 240486-001.
-
-
- SUBSTANTIVE DIFFERENCES - (SOFTWARE)
-
- SS-1 Description
-
- The TR4 cache test register holds the cache tag address, valid bits
- and LRU bits for the current cache test operation. The TR5 cache test
- register defines the cache line, cache set and control bits for the
- cache test operation. Since the cache size and organization differ
- between the Cx486SLC/DLC and the 80486, TR4 and TR5 have similar but
- not identical bit definitions on the Cx486SLC/DLC and the 80486.
-
- Analysis
-
- Cache test and diagnostic software - if written to explicitly depend
- on the cache size and organization of the 80486 - may produce unexpected
- results when run on a Cx486SLC/DLC. The results of the programs typically
- have no effect on operating systems or applications software. For proper
- test or diagnosis of the Cx486SLC/DLC cache, software should be used
- which is specifically written to comprehend the Cx486SLC/DLC.
-
-
- 80486SX
-
- 31 11 10 9 7 6 3 2 0
- +------------------------------------------+---+--------+-------+--------+
- TR4 | Tag | V | LRU | Valid | Unused |
- +------------------------------------------+---+--------+-------+--------+
-
- V This is the valid bit for the particular cache line which was
- accessed. On a cache lookup, it is a copy of one of the bits
- reported in bits 3..6, which are the valid bits for all four
- cache lines in the selected set. On a cache write, it becomes
- the new valid bit for the particular cache line selected within
- the selected set.
- LRU On a cache lookup, these are the three LRU bits of the set which
- was accessed. On a cache write, these bits are ignored; the LRU
- bits in the cache are updated by the pseudo-LRU cache replacement
- algorithm. LRU bit 0 (TR4 bit 7) indicates which group of two
- cache lines in the set contains the cache line that has been least
- recently used. The bit is clear when the least recently used line
- is either line 0 or line 1, and is set when the least recently
- used line in the set is either line 2 or line 3. LRU bit 1 (TR4
- bit 8) and LRU bit 2 (TR4 bit 9) indicate which of the two lines
- in the group of lines selected by LRU bit 0 is the least recently
- used, where LRU bit 1 indicates either line 0 (bit=0) or line 1
- (bit=1) and LRU bit 2 indicates either line 2 (bit=0) or line 3
- (bit=1) has been least recently used. A real LRU replacement
- algorithm would have to use 5 bits.
- Valid On a cache lookup, these are the four Valid bits of the set which
- was accessed, where each bit corresponds to one of the four cache
- lines in the set.
-
-
- 486SLC/DLC
-
- 31 9 8 7 6 3 2 0
- +------------------------------------------------+-+-----+-------+-------+
- TR4 | Tag |U| LRU | Valid | 0 0 0 |
- +------------------------------------------------+-+-----+-------+-------+
-
- U bit 8 is unused.
- LRU On a cache lookup, this is the LRU bit associated with the cache
- set. On a cache write, this bit is ignored. Bit=0 means line 0
- in the selected set has been least recently used, bit=1 means line
- 1 in the selcted set has been least recently used.
- Valid On a cache lookup, these are the four valid bits for the particular
- cache line accessed (one bit per byte in the cache line). On a cache write
- these are the valid bits written into the line.
-
-
-
- 80486SX
-
- 31 11 10 4 3 2 1 0
- +--------------------------------------+------------------+-------+------+
- TR5 | Unused | Set Select | Entry | Ctrl |
- +--------------------------------------+------------------+-------+------+
-
- Set Select Selects one of the 128 sets of the cache.
- Entry Selects one of the four cache lines within the selected set.
- Ctrl 00 write to cache fill buffer, or read from cache read buffer
- 01 perform cache write
- 10 perform cache read
- 11 flush cache (mark all entries invalid)
-
- 486SLC/DLC
-
- 31 11 10 4 3 2 1 0
- +--------------------------------------+------------------+-+-----+------+
- TR5 | Unused | Set Select |U| Ent | Ctrl |
- +--------------------------------------+------------------+-+-----+------+
-
- Set Select Selects one of the 128 sets of the cache.
- U bit 3 is unused
- Entry Selects one of the two cache lines within the selected set.
- Ctrl 00 ignored
- 01 perform cache write
- 10 perform cache read
- 11 flush cache (mark all entries invalid)
-
-
- SS-2 Description
-
- The 80486 NW (not write-through) bit in CR0 disables 80486 write-through
- capability. If the cache disabled bit is on, a write occurs to a cache-hit
- location, and NW is a 1, then the 80486 does not perform an external write
- bus cycle. This bit is not available on the Cx486SLC/DLC and is fixed at
- zero.
-
- Analysis
-
- The NW bit on the 80486 allows for a capability of self-contained
- processing once a program has been loaded into the cache and the cache
- disabled. Programs that use this feature will work on the Cx486SLC/DLC
- with writes happening on external write bus cycles.
-
-
- SS-3 Description
-
- On systems with hardware FPUs, whose FPU ERROR signal is routed to the
- CPU ERROR signal (NE bit set on the 80486DX), a floating point error is
- normally acknowledged by the CPU upon execution of the next floating
- point instruction. If the next floating point instruction is a load single
- or load double precision that would have generated a General Protection
- (GP) fault, it is possible for the Cx486SLC/DLC to acknowledge the GP
- fault before the coprocessor error fault. The 80486 acknowledges the
- coprocessor error first.
-
- Analysis
-
- This condition (FPU ERROR connected to CPU ERROR) does not occur in PC
- compatible designs.
-
-
-
- INFORMATIONAL DIFFERENCES - (SOFTWARE)
-
- IS-1 Description
-
- Certain 80486 flag bits in the flags register are documented by Intel
- as undefined after execution of certain instructions. Testing at Cyrix
- has shown that the final states of theses flag bits are in fact
- unpredictable. The Cx486SLC/DLC leaves the flag bit values unmodified
- after execution of the same instructions.
-
- Analysis
-
- Since the flag bits are documented by Intel to be undefined after certain
- operations, software can not reliably use the resulting flag bit values.
-
-
- IS-2 Description
-
- Early revision 80486SX CPUs have a programmable Numeric Exception control
- bit in control register CR0 (bit 28). This bit was intended to control
- whether numeric execptions are handled internally (NE=1) or driven
- externally on a discrete CPU pin (NE=0). On these 80486SXs, the NE bit
- can be set to a one even though numeric execptions can not be handled
- internally due to the fact that no coprocessor exists. Reading the NE
- bit on the coprocessor exists. Reading the NE bit on the Cx486SLC/DLC
- always returns a zero indicating that numeric exceptions are always
- handled externally.
-
- Analysis
-
- Since the Cx486SLC/DLC does not have an on-board floating point unit, the
- coprocessor interface (including numeric exception signaling) operates in
- a fashion compatible with the 80386.The Cx486SLC/DLC and 80386 use an
- external coprocessor which generates the numeric exception and always
- return zero when the NE bit is read.
-
-
- IS-3 Description
-
- When trying to reference CR1 in protected mode while not at the highest
- privilege level (level 0), the 80486 generates an Invalid Opcode fault,
- whereas the Cx486SLC/DLC generate a General Protection (GP) fault.
-
- Analysis
-
- The Cx486SLC/DLC and 80486 do not define the bits in the CR1 register.
- Since there are no valid bits in the CR1 register, any exception taken,
- whether it is a GP fault or Invalid Opcode fault, will signal that an
- invalid operation has taken place.
-
-
- IS-4 Description
-
- When using the Translation Lookaside Buffer (TLB) test registers, the
- undefined bits in TR7 may differ between the 80486 and the Cx486SLC/DLC
- when a look-up miss (TR7 bit 4 is clear) occurs. This includes the REP
- field (bits 2-3).
-
- Analysis
-
- The majority of the bits in TR7 are documented by Intel to be undefined
- after a TLB look-up miss. Therefore, software programs can not reliably
- use the resulting values of these undefined bits.
-
-
- IS-5 Description
-
- Cx486SLC/DLC reads and writes to Debug Register 4 (DR4) and Debug
- Register 5 (DR5) result in accesses to Debug Register 6 (DR6) and
- Debug Register 7 (DR7), respectively. Accessing DR4 and DR5 on the
- 80486 produces an Invalid Opcode fault.
-
- Analysis
-
- DR4 and DR5 are documented as undefined by Intel on the 80486. Since
- the results are undefined, software programs can not reliably use the
- register results.
-
-
- IS-6 Description
-
- Writing duplicate TLB tags using the TLB test registers generates
- different results on the Cx486SLC/DLC than on the 80486 when the
- duplicate address is looked up. The results of writing duplicate
- TLB tags is documented as undefined by Intel.
-
- Analysis
-
- Writing duplicate TLB tags using the TLB test registers is an unsupported
- operation. The Cx486SLC/DLC and 80486 return undefined results when
- looking up the resulting address. Since the results are undefined,
- software programs can not reliably use the register results.
-
-
- IS-7 Description
-
- The 80486 imposes a performance penalty in order to report debug faults
- precisely. The Cx486SLC/DLC reports debug faults precisely without a
- performance penalty (except for a repeated MOVS instruction).
-
- Analysis
-
- The Cx486SLC/DLC provides superior debugging capability.
-
-
- IS-8 Description
-
- The 80486 writes zeroes to the destination register when executing a
- Bit-Scan Forward (BSF) instruction if all zeroes are found in the
- specified bit map. The Cx486DLC/DLC leaves the destination register
- unchanged under this condition.
-
- Analysis
-
- The value in the destination register of a BSF instruction is specified
- by Intel to be undefined when a one bit is not found in the source
- operand. Since the results are undefined, software programs can not be
- reliably use the register results.
-
-
- IS-9 Description
-
- Memory versions of the instructions ADC, ADD, AND, DEC, INC, MOVS, NEG,
- NOT, OR, RCl, ROL, ROR, SAl, SAR, SBB, SUB, SHL, SHLD, SHR, SHRD, XCHG,
- and XOR read the destination memory, operate on it, and write it back to
- memory. The Cx486SLC/DLC checks the writability of the destination before
- performing these instructions. On non-writable locations, the Cx486SLC/
- DLC faults before starting the instruction. The 80486 performs the read,
- sets the read location acessed bit, and modifies the flags before
- faulting.
-
- Analysis
-
- By checking the writability first prior to execution of the instruction
- (at no performance penalty), the Cx486SLC/DLC avoids unnecessary
- operations. Leaving the accessed bit and flag contents in their original
- state is prefered if the instruction is restarted.
-
-
- IS-10 Description
-
- In the case above, if the read locatuion is also not present, the 80486
- will attempt the read, take a page fault, reload the page, restart the
- instruction, and then take a GP fault. The Cx486SLC/DLC will take a GP
- fault.
-
- Analysis
-
- The 80486 wastes time loading the requested page before taking the
- required GP fault. The GP fault is eventually detected by both the 80486
- and the Cx486SLC/DLC.
-
-
- IS-11 Description
-
- If a locked instruction accesses a memory page marked as not present, the
- 80486 reports in the error code that the access type was a write while
- the Cx486SLC/DLC reports that the access type was a read.
-
- Analysis
-
- Since the page is not present in either case (read or write), the same
- page fault is taken by both the Cx486SLC/DLC and the 80486.
-
-
- IS-12 Description
-
- When alignment checking is enabled an an ENTER instruction that misaligns
- the stack is executed, the 80486 generates an alingment check fault even
- though the misaligned stack has not been accessed. The Cx486SLC/DLC
- generates the aligment check fault only when the misaligned stack is
- accessed.
-
- Analysis
-
- The Cx486SLC/DLC correctly generates an alignment check fault only when
- a misaligned stack is accessed. The 80486 unnecessarily takes the fault
- in the case described.
-
-
- IS-13 Description
-
- When executing a REP LOOPE (repeated loop while equal) instruction, the
- 80486 does not perform the "if equal" function of the instruction. The
- Cx486SLC/DLC does perfrom the "if equal" check under the same
- circumstances.
-
- Analysis
-
- The 80486 execution should be considered incorrect. The Cx486SCL/DLC
- correctly executes this instruction sequence.
-
-
- IS-14 Description
-
- The 80486 incorrectly asserts the LOCK# pin while enterinf the illegal
- instruction exception handler when using the LOCK prefix on instructions
- other than those allowed (Only BTS, BTR, BTC, XCHG, INC, DEC, NOT, NEG,
- ADD, ADC, SUB, SBB, AND, OR, XOR are allowed). The Cx486SLC/DLC correctly
- does not assert LOCK# in this case.
-
- Analysis
-
- When using the 80486 in a multi-processor environment, the bus may be
- locked unnecessarily causing performance degradation.
-
-
-
- Operating systems/operating environments tested with the Cx486SLC/DLC:
-
- Digital Research: Concurrent DOS 386 5.0, DR-DOS 6.0
- Ergo: OS/386
- IBM: IBM DOS 3.3, IBM DOS 4.0, OS/2 2.0, OS/2 SE 1.3
- IGC: VM/386 2.01
- Interactive: Interactive Unix 3.2
- Mark Williams: Coherent 3.1, Coherent 3.2
- Microsoft: MS-DOS 3.3, MS-DOS 4.01, MS-DOS 5.0, Windows 3.0, Windows 3.1
- Pharlap: DOS-Extender 286, DOS-Extender 386
- Quarterdeck: Desqview 386 2.32
- Rational: DOS/4G
- SCO: SCO Open Desktop, SCO Unix, SCO Xenix 2.3.2c
- Symantec: Norton Desktop for Windows 1.0
- UHC: Developers Environment, Network Module, X11R4/Motif Windowing
- Module, UNIX Release 4.0 Ver. 3.6
-