home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
DP Tool Club 11
/
CD_ASCQ_11_0294.iso
/
maj
/
4266
/
86bugs.lst
< prev
next >
Wrap
File List
|
1993-12-05
|
14KB
|
386 lines
Last Change 7/17/93. Please send updates directly to Harald.
86BUGS.LST revision 1.0
By Harald Feldmann (harald.feldmann@almac.co.uk), mail address:
Hamarsoft, p.o. box 91, 6114 ZH Susteren, The Netherlands.
(Please retain my name and address in the document)
This file lists undocumented and buggy instructions of the Intel 80x86
family of processors. Some of the information was obtained from the book
"Programmer's technical reference, the processor and coprocessor; by
Robert L. Hummel; Ziff davis press. ISBN 1-56276-016-5 Which is highly
recommended. Note that Intel does not support the special features and
may decide to drop opcode variants and instructions in future products.
All mentioned trademarks and/or tradenames are owned by the respective
owners and are acknowledged.
Undocumented instructions and undocumented features of Intel and IIT
processors:
AAD: OPCODE: d5,0a OPCODE VARIANT
This instruction regularly performs the following action:
- unpacked BCD in AX example (AX = 0104h)
- AL = AH * 10d + AL (AL = 0eh )
- AH = 00 (AH = 00h )
The normal opcode decodes as follows: d5,0a
The instruction itself is an instruction plus operand. By
replacing the second byte with any number in the range 00 -
ff we can build our own instruction AAD for various number
systems in those ranges. For example by coding d5,10 we
achieve an instruction that performs: AL = AH * 16d + AL.
Note: the variant is not supported on all 80x86-compatible
CPUs, notably the NEC V-series, because some hard-code the
divisor at 0Ah
AAM: OPCODE: d4,0a OPCODE VARIANT
This instruction regularly performs the following action:
- binary number in AL
- AH = AL / 10d
- AL = AL MOD 10d
Thus creating an unpacked BCD in AX.
The normal opcode decodes as follows: d4,0a
The instruction itself is an instruction plus operand. By
replacing the second byte with any number in the range 00 -
ff we can build our own instruction AAM for various number
systems in that range. For example by coding d4,07 we
achieve an instruction that performs: AH = AL / 07d, AL = AL
MOD 07d
The AAD and AAM opcode variants have been found in Future
Domain SCSI controller ROMS.
LOADALL: OPCODE: 0f,05 (i80286) & 0f,07 (i80386 & i80486)
UNDOCUMENTED
Load _ALL_ processor registers. Does exactly as the name
suggests, separate versions for i80286 and i80386 exist. The
i80286 LOADALL instruction reads a block of 102 bytes into
the chip, starting at address 000800 hex. The i80286 LOADALL
takes 195 clocks to execute.
The sequence is as follows (Hex address, Bytes, Register):
0800: 6 N/A
0806: 2 MSW (Machine Status Word)
0808: 14 N/A
0816: 2 TR (Task Register)
0818: 2 FLAGS (Flags)
081a: 2 IP (Instruction Pointer)
081c: 2 LDT (Local Descriptor Table)
081e: 2 DS (Data Segment)
0820: 2 SS (Stack Segment)
0822: 2 CS (Code Segment)
0824: 2 ES (Extra Segment)
0826: 2 DI (Destination Index)
0828: 2 SI (Source Index)
082a: 2 BP (Base Pointer)
082c: 2 SP (Stack Pointer)
082e: 2 BX (BX register)
0830: 2 DX (DX register)
0832: 2 CX (CX register)
0834: 2 AX (AX register)
0836: 6 ES cache (ES descriptor _cache_)
083c: 6 CS cache (CS descriptor _cache_)
0842: 6 SS cache (SS descriptor _cache_)
0848: 6 DS cache (DS descriptor _cache_)
084e: 6 GDTR (Global Descriptor Table)
0854: 6 LDT cache (Local Descriptor_cache_)
085a: 6 IDTR (Interrupt Descriptor table)
0860: 6 TSS cache (Task State Segment _cache_)
Descriptor cache entries are internal copies of the
original registers (the LDT cache is normally a copy of the
last regularly _loaded_ LDT). Note that after executing
LOADALL, the chip will use the _cache_ registers without
re-checking the caches against the regular registers. That
means that cache and register do not have to be the same.
Caches are updated when the original register is loaded
again. Both will then contain the same value.
Descriptor caches layout:
3 bytes 24 bit physical address of segment
1 byte access rights byte, mapped as access right
byte in a regular descriptor. The present
bit now represents a valid bit. If this bit
is cleared (zero) the segment is invalid and
accessing it will trigger exception 0dh. The
DPL (Descriptor Privilege Level) fields of
the CS and SS descriptor caches determine
the CPL (Current Privilege Level).
2 bytes 16 bit segment limit.
This layout is the same for the GDTR and IDTR registers,
except that the access rights byte must be zero.
i80386 LOADALL:
The i80386 variant loads 204 (dec) bytes from the address at
ES:EDI and resumes execution in the specified state.
No timing information available.
relative offset: Bytes: Registers:
0000: 4 CR0
0004: 4 EFLAGS
0008: 4 EIP
000c: 4 EDI
0010: 4 ESI
0014: 4 EBP
0018: 4 ESP
001c: 4 EBX
0020: 4 EDX
0024: 4 ECX
0028: 4 EAX
002c: 4 DR6
0030: 4 DR7
0034: 4 TR
0038: 4 LDT
003c: 4 GS (zero extended)
0040: 4 FS (zero extended)
0044: 4 DS (zero extended)
0048: 4 SS (zero extended)
004c: 4 CS (zero extended)
0050: 4 ES (zero extended)
0054: 12 TSS descriptor cache
0060: 12 IDT descriptor cache
006c: 12 GDT descriptor cache
0078: 12 LDT descriptor cache
0084: 12 GS descriptor cache
0090: 12 FS descriptor cache
009c: 12 DS descriptor cache
00a8: 12 SS descriptor cache
00b4: 12 CS descriptor cache
00c0: 12 ES descriptor cache
Descriptor caches layout:
1 byte zero
1 byte access rights byte, same as i80286
2 bytes zero
4 bytes 32 bit physical base address of segment
4 bytes 32 bit segment limit
UNKNOWN: OPCODE: 0f,04 UNDOCUMENTED
This instruction is likely to be an alias for the LOADALL on
the i80286. It is not documented and is even marked as
unused in the 'Programmer's technical reference'. Still it
executes on the i80286. >> info wanted <<
SETALC: OPCODE: d6 UNDOCUMENTED
This instruction copies the Carry Flag to the AL register.
In case of a CY, AL becomes ffh. When the Carry Flag is
cleared, AL becomes 00.
Floating Point special instructions:
FMUL4X4: OPCODE: db,f1 IIT ONLY
This instruction is available only on the IIT (Integrated
Information Technology Inc.) math processors.
Takes 242 clocks.
The instruction performs a 4x4 matrix multiply in one
instruction using four banks of 8 floating point registers.
The operands must be loaded to a specific bank in a specific
order. The equation solved can be represented by:
Xn = (A00 * Xo) + (A01 * Xo) + (A02 * Xo) + (A03 * Xo)
Yn = (A10 * Yo) + (A11 * Yo) + (A12 * Yo) + (A13 * Yo)
Zn = (A20 * Zo) + (A21 * Zo) + (A22 * Zo) + (A23 * Zo)
Vn = (A30 * Vo) + (A31 * Vo) + (A32 * Vo) + (A33 * Vo)
Where Xo stands for the original X value and Xn for the
result. Operands must be loaded to the following registers
in the specified banks in the specified order.
Before FMUL4X4 After FMUL4X4
bank bank
Register: 0 1 2 0
ST(0) Xo A33 A31 Xn
ST(1) Yo A23 A21 Yn
ST(2) Zo A13 A11 Zn
ST(3) Vo A03 A01 Vn
ST(4) A32 A30 ?
ST(5) A22 A20 ?
ST(6) A12 A10 ?
ST(7) A02 A00 ?
All four banks can be selected by using the bankswitching
instructions, but only bank 0, 1 and 2 make sense since bank
3 is an internal scratchpad. The separate banks can contain
8 floating points and may be re-used with normal
instructions. Each bank acts like an independent i80287,
except when bankswitched inbetween, in those cases where the
initial status is not maintained;
Pseudo- multichip operation can be performed in each bank
and even in multiple banks at the same time (although only
one instruction will operate on one register at any given
time), provided that the active register and top register
are not changed after switching from bank to bank.
EXAMPLE:
FINIT ; reset control word
FSBP1 ; select bank 1
FLD DWORD PTR es:[si] ; first original
FLD DWORD PTR es:[si+4] ; second original
FLD DWORD PTR es:[si+8] ; third original
FSTCW WORD PTR [bx] ; save FPU control status
FSBP2 ; NOTE ! you will see three
active registers in this
bank when using a
debugger
FINIT ; nothing visible
FLD DWORD PTR [si] ; new value
FLD DWORD PTR [si+4] ; second new value
FADD ST,ST(1) ; two values visible
FSTP DWORD PTR [si+8] ; one value visible
FSBP1 ; one original visible
FLDCW WORD PTR [bx] ; restore FPU status to the
one active in bank 1,
causing original three
values to be visible
again in correct
sequence
... simply continue with what you wanted to do with
those numbers from es:[si], they are still there.
FLD DWORD PTR [si+8] ; for instance...
This feature of the IIT chips can be used to perform complex
operations in registers with many components remaining the
same for a large dataset, only saving intermediary results
to ONE memory location, bankswitching to the next series of
operands, loading that ONE operand and continuing the
calculation with the next set of operands already in that
bank. This does require another read into the new bank but
may save time and memoryspace compared to memory based
operands or multiple pass algorithms with multiple arrays of
intermediary results.
BANKSWITCH INSTRUCTIONS:
FSBP0: OPCODE: db,e8 IIT ONLY
Selects the original bank. (default) (6 clocks)
FSBP1: OPCODE: db,eb IIT ONLY
Selects bank 1 from FMUL4X4 instruction diagram (6 clocks)
FSBP2: OPCODE: db,ea IIT ONLY
Selects bank 2 from FMUL4X4 instruction diagram (6 clocks)
FSBP3: OPCODE: db,e9 IIT ONLY UNDOCUMENTED
Selects the scratchpad bank3 used by the FMUL4X4 internally.
Not very useful but funny to look at... How-to: load
any value into bank 0,1 or 2 until you have a full 8
registers, then execute this bankswitch. Using a
debugger like CodeView you are now able to inspect the
bank3 registers. (most likely to take 6 clocks)
TRIGONIOMETRIC FUNCTIONS:
Apparently the IIT 2c87 recognises and executes some
i80387 trigoniometric functions. UNDOCUMENTED
FSIN (sine) and FCOS (cosine) have been tested and function
according to the Intel 80387 specifications. FSINCOS
(available on the Intel 80287XL, 80387 and up) does not
work.
FSIN: OPCODE: d9,fe IIT 2c87+ (also Intel 80387+) UNDOCUMENTED
Calculates the sine of the input in radians in ST(0). After
calculation, ST(0) contains the sine. Takes approximately
120 clocks.
FCOS: OPCODE: d9,ff IIT 2c87+ (also Intel 80387+) UNDOCUMENTED
Calculates the cosine of the input in radians in ST(0).
After calculation, ST(0) contains the cosine. Takes
approximately 120 clocks.
... CUT HERE FOR FIRST REVISION, next part is to be revised ...
Instructions by mnemonic mnemonic:
opcode: processor: remark & remedy:
AAA i80286 & i80386 & i80486
CMPS i80286
CMPXCHG i80486
FINIT
FSTSW
FSTCW
INS i80286 &
i80386 &
i80486
INVD i80486
MOV to SS n/a early 8088 Some early 8088 would not properly
disable interrupts after a move to
the SS register. Workaround would
be to explicitly clear the
interrupts, update SS and SP and
then re-enable the interrupts.
Typically this would occur in a
situation where one would relocate
a stack in memory, more than 64Kb
from the original one, updating
both SS and SP like in:
MOV SS,AX ; would disable
interrupts
automatically during
this and next
instruction.
MOV SP,DX ; interrupts disabled
... ; interrupts enabled.
multiple prefixes
with REPx 8088 & 8086 They would not properly restart at
the first prefix byte after an
interrupt. when more than one
prefix is used. e.g. LOCK REP MOVSW
CS:[bx]. A workaround is to test
after the instruction for CX==0,
here: LOCK REP MOVSW CS:[BX] OR
CX,CX JNZ here because of the CS
override, the REP and LOCK prefixes
would not be recognised to be part
of the instruction and the REP MOVSW
would be aborted. This also seems to
be the case for a REP MOVSW CS:[BX]
Note that this also implies that
REPZ, REPNZ are affected in SCASW
for instance.