home *** CD-ROM | disk | FTP | other *** search
- Chapter 11 Coprocessing and Multiprocessing
-
- ────────────────────────────────────────────────────────────────────────────
-
- The 80386 has two levels of support for multiple parallel processing units:
-
- ■ A highly specialized interface for very closely coupled processors of
- a type known as coprocessors.
-
- ■ A more general interface for more loosely coupled processors of
- unspecified type.
-
-
- 11.1 Coprocessing
-
- The components of the coprocessor interface include:
-
- ■ ET bit of control register zero (CR0)
- ■ The EM, and MP bits of CR0
- ■ The ESC instructions
- ■ The WAIT instruction
- ■ The TS bit of CR0
- ■ Exceptions
-
-
- 11.1.1 Coprocessor Identification
-
- The 80386 is designed to operate with either an 80287 or 80387 math
- coprocessor. The ET bit of CR0 indicates which type of coprocessor is
- present. ET is set automatically by the 80386 after RESET according to the
- level detected on the ERROR# input. If desired, ET may also be set or reset
- by loading CR0 with a MOV instruction. If ET is set, the 80386 uses the
- 32-bit protocol of the 80387; if reset, the 80386 uses the 16-bit protocol
- of the 80287.
-
-
- 11.1.2 ESC and WAIT Instructions
-
- The 80386 interprets the pattern 11011B in the first five bits of an
- instruction as an opcode intended for a coprocessor. Instructions thus
- marked are called ESCAPE or ESC instructions. The CPU performs the following
- functions upon encountering an ESC instruction before sending the
- instruction to the coprocessor:
-
- ■ Tests the emulation mode (EM) flag to determine whether coprocessor
- functions are being emulated by software.
-
- ■ Tests the TS flag to determine whether there has been a context change
- since the last ESC instruction.
-
- ■ For some ESC instructions, tests the ERROR# pin to determine whether
- the coprocessor detected an error in the previous ESC instruction.
-
- The WAIT instruction is not an ESC instruction, but WAIT causes the CPU to
- perform some of the same tests that it performs upon encountering an ESC
- instruction. The processor performs the following actions for a WAIT
- instruction:
-
- ■ Waits until the coprocessor no longer asserts the BUSY# pin.
-
- ■ Tests the ERROR# pin (after BUSY# goes inactive). If ERROR# is active,
- the 80386 signals exception 16, which indicates that the coprocessor
- encountered an error in the previous ESC instruction.
-
- ■ WAIT can therefore be used to cause exception 16 if an error is
- pending from a previous ESC instruction. Note that, if no coprocessor
- is present, the ERROR# and BUSY# pins should be tied inactive to
- prevent WAIT from waiting forever or causing spurious exceptions.
-
-
- 11.1.3 EM and MP Flags
-
- The EM and MP flags of CR0 control how the processor reacts to coprocessor
- instructions.
-
- The EM bit indicates whether coprocessor functions are to be emulated. If
- the processor finds EM set when executing an ESC instruction, it signals
- exception 7, giving the exception handler an opportunity to emulate the ESC
- instruction.
-
- The MP (monitor coprocessor) bit indicates whether a coprocessor is
- actually attached. The MP flag controls the function of the WAIT
- instruction. If, when executing a WAIT instruction, the CPU finds MP set,
- then it tests the TS flag; it does not otherwise test TS during a WAIT
- instruction. If it finds TS set under these conditions, the CPU signals
- exception 7.
-
- The EM and MP flags can be changed with the aid of a MOV instruction using
- CR0 as the destination operand and read with the aid of a MOV instruction
- with CR0 as the source operand. These forms of the MOV instruction can be
- executed only at privilege level zero.
-
-
- 11.1.4 The Task-Switched Flag
-
- The TS bit of CR0 helps to determine when the context of the coprocessor
- does not match that of the task being executed by the 80386 CPU. The 80386
- sets TS each time it performs a task switch (whether triggered by software
- or by hardware interrupt). If, when interpreting one of the ESC
- instructions, the CPU finds TS already set, it causes exception 7. The WAIT
- instruction also causes exception 7 if both TS and MP are set. Operating
- systems can use this exception to switch the context of the coprocessor to
- correspond to the current task. Refer to the 80386 System Software Writer's
- Guide for an example.
-
- The CLTS instruction (legal only at privilege level zero) resets the TS
- flag.
-
-
- 11.1.5 Coprocessor Exceptions
-
- Three exceptions aid in interfacing to a coprocessor: interrupt 7
- (coprocessor not available), interrupt 9 (coprocessor segment overrun), and
- interrupt 16 (coprocessor error).
-
-
- 11.1.5.1 Interrupt 7 ── Coprocessor Not Available
-
- This exception occurs in either of two conditions:
-
- 1. The CPU encounters an ESC instruction and EM is set. In this case,
- the exception handler should emulate the instruction that caused the
- exception. TS may also be set.
-
- 2. The CPU encounters either the WAIT instruction or an ESC instruction
- when both MP and TS are set. In this case, the exception handler
- should update the state of the coprocessor, if necessary.
-
-
- 11.1.5.2 Interrupt 9 ── Coprocessor Segment Overrun
-
- This exception occurs in protected mode under the following conditions:
-
- ■ An operand of a coprocessor instruction wraps around an addressing
- limit (0FFFFH for small segments, 0FFFFFFFFH for big segments, zero for
- expand-down segments). An operand may wrap around an addressing limit
- when the segment limit is near an addressing limit and the operand is
- near the largest valid address in the segment. Because of the
- wrap-around, the beginning and ending addresses of such an operand
- will be near opposite ends of the segment.
-
- ■ Both the first byte and the last byte of the operand (considering
- wrap-around) are at addresses located in the segment and in present and
- accessible pages.
-
- ■ The operand spans inaccessible addresses. There are two ways that such
- an operand may also span inaccessible addresses:
-
- 1. The segment limit is not equal to the addressing limit (e.g.,
- addressing limit is FFFFH and segment limit is FFFDH); therefore,
- the operand will span addresses that are not within the segment
- (e.g., an 8-byte operand that starts at valid offset FFFC will span
- addresses FFFC-FFFF and 0000-0003; however, addresses FFFE and FFFF
- are not valid, because they exceed the limit);
-
- 2. The operand begins and ends in present and accessible pages but
- intermediate bytes of the operand fall either in a not-present page
- or in a page to which the current procedure does not have access
- rights.
-
- The address of the failing numerics instruction and data operand may be
- lost; an FSTENV does not return reliable addresses. As with the 80286/80287,
- the segment overrun exception should be handled by executing an FNINIT
- instruction (i.e., an FINIT without a preceding WAIT). The return address on
- the stack does not necessarily point to the failing instruction nor to the
- following instruction. The failing numerics instruction is not restartable.
-
- Case 2 can be avoided by either aligning all segments on page boundaries or
- by not starting them within 108 bytes of the start or end of a page. (The
- maximum size of a coprocessor operand is 108 bytes.) Case 1 can be avoided
- by making sure that the gap between the last valid offset and the first
- valid offset of a segment is either no less than 108 bytes or is zero (i.e.,
- the segment is of full size). If neither software system design constraint
- is acceptable, the exception handler should execute FNINIT and should
- probably terminate the task.
-
-
- 11.1.5.3 Interrupt 16 ── Coprocessor Error
-
- The numerics coprocessors can detect six different exception conditions
- during instruction execution. If the detected exception is not masked by a
- bit in the control word, the coprocessor communicates the fact that an error
- occurred to the CPU by a signal at the ERROR# pin. The CPU causes interrupt
- 16 the next time it checks the ERROR# pin, which is only at the beginning of
- a subsequent WAIT or certain ESC instructions. If the exception is masked,
- the numerics coprocessor handles the exception according to on-board logic;
- it does not assert the ERROR# pin in this case.
-
-
- 11.2 General Multiprocessing
-
- The components of the general multiprocessing interface include:
-
- ■ The LOCK# signal
-
- ■ The LOCK instruction prefix, which gives programmed control of the
- LOCK# signal.
-
- ■ Automatic assertion of the LOCK# signal with implicit memory updates
- by the processor
-
-
- 11.2.1 LOCK and the LOCK# Signal
-
- The LOCK instruction prefix and its corresponding output signal LOCK# can
- be used to prevent other bus masters from interrupting a data movement
- operation. LOCK may only be used with the following 80386 instructions when
- they modify memory. An undefined-opcode exception results from using LOCK
- before any instruction other than:
-
- ■ Bit test and change: BTS, BTR, BTC.
- ■ Exchange: XCHG.
- ■ Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR.
- ■ One-operand arithmetic and logical: INC, DEC, NOT, and NEG.
-
- A locked instruction is only guaranteed to lock the area of memory defined
- by the destination operand, but it may lock a larger memory area. For
- example, typical 8086 and 80286 configurations lock the entire physical
- memory space. The area of memory defined by the destination operand is
- guaranteed to be locked against access by a processor executing a locked
- instruction on exactly the same memory area, i.e., an operand with
- identical starting address and identical length.
-
- The integrity of the lock is not affected by the alignment of the memory
- field. The LOCK signal is asserted for as many bus cycles as necessary to
- update the entire operand.
-
-
- 11.2.2 Automatic Locking
-
- In several instances, the processor itself initiates activity on the data
- bus. To help ensure that such activities function correctly in
- multiprocessor configurations, the processor automatically asserts the LOCK#
- signal. These instances include:
-
- ■ Acknowledging interrupts.
-
- After an interrupt request, the interrupt controller uses the data bus
- to send the interrupt ID of the interrupt source to the CPU. The CPU
- asserts LOCK# to ensure that no other data appears on the data bus
- during this time.
-
- ■ Setting busy bit of TSS descriptor.
-
- The processor tests and sets the busy-bit in the type field of the TSS
- descriptor when switching to a task. To ensure that two different
- processors cannot simultaneously switch to the same task, the processor
- asserts LOCK# while testing and setting this bit.
-
- ■ Loading of descriptors.
-
- While copying the contents of a descriptor from a descriptor table into
- a segment register, the processor asserts LOCK# so that the descriptor
- cannot be modified by another processor while it is being loaded. For
- this action to be effective, operating-system procedures that update
- descriptors should adhere to the following steps:
-
- ── Use a locked update to the access-rights byte to mark the
- descriptor not-present.
-
- ── Update the fields of the descriptor. (This may require several
- memory accesses; therefore, LOCK cannot be used.)
-
- ── Use a locked update to the access-rights byte to mark the
- descriptor present again.
-
- ■ Updating page-table A and D bits.
-
- The processor exerts LOCK# while updating the A (accessed) and D
- (dirty) bits of page-table entries. Also the processor bypasses the
- page-table cache and directly updates these bits in memory.
-
- ■ Executing XCHG instruction.
-
- The 80386 always asserts LOCK during an XCHG instruction that
- references memory (even if the LOCK prefix is not used).
-
-
- 11.2.3 Cache Considerations
-
- Systems programmers must take care when updating shared data that may also
- be stored in on-chip registers and caches. With the 80386, such shared
- data includes:
-
- ■ Descriptors, which may be held in segment registers.
-
- A change to a descriptor that is shared among processors should be
- broadcast to all processors. Segment registers are effectively
- "descriptor caches". A change to a descriptor will not be utilized by
- another processor if that processor already has a copy of the old
- version of the descriptor in a segment register.
-
- ■ Page tables, which may be held in the page-table cache.
-
- A change to a page table that is shared among processors should be
- broadcast to all processors, so that others can flush their page-table
- caches and reload them with up-to-date page tables from memory.
-
- Systems designers can employ an interprocessor interrupt to handle the
- above cases. When one processor changes data that may be cached by other
- processors, it can send an interrupt signal to all other processors that may
- be affected by the change. If the interrupt is serviced by an interrupt
- task, the task switch automatically flushes the segment registers. The task
- switch also flushes the page-table cache if the PDBR (the contents of CR3)
- of the interrupt task is different from the PDBR of every other task.
-
- In multiprocessor systems that need a cacheability signal from the CPU, it
- is recommended that physical address pin A31 be used to indicate
- cacheability. Such a system can then possess up to 2 Gbytes of physical
- memory. The virtual address range available to the programmer is not
- affected by this convention.
-
-