home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!mcsun!sunic!dkuug!diku!thorinn
- From: thorinn@diku.dk (Lars Henrik Mathiesen)
- Newsgroups: comp.arch
- Subject: Re: trapping speculative ops (LONG)
- Message-ID: <1992Aug31.224611.5196@odin.diku.dk>
- Date: 31 Aug 92 22:46:11 GMT
- References: <CLIFFC.92Aug28085924@antigone.rice.edu>
- Sender: thorinn@tyr.diku.dk
- Organization: Department of Computer Science, U of Copenhagen
- Lines: 55
-
- To summarize: trap bits are proposed to enable a compiler to move a
- potentially trapping operation outside of a condition. As a special
- case, this would allow many, if not all, of the same optimizations
- that can be done if a load through a NULL pointer yields zeroes.
-
- Note that this does not just avoid pipeline stalls; it can also allow
- code motion out of loops in cases where it is not safe on standard
- architectures (i.e., where the code traps in the zero-trip case).
-
- An interesting consequence is that traps will be synchronous on many
- implementations (out-of-order execution is the exception). This may
- even be a desirable thing to fix in the architecture, especially since
- the improved scheduling possibilities will allow the compiler to do
- much the same thing that out-of-order execution does in hardware.
-
- One advantage will be that trap barriers are not needed for debugging:
- The compiler can construct a mapping from the first use of a value to
- the instruction that created it, enabling the debugger to find out
- what went wrong. Thus, the same binary can be used for test and
- production. Also, compilers for languages that specify synchronous
- exceptions can use dummy moves to test for traps, one at a time,
- exactly when they need to, instead of flushing the whole pipeline
- after each operation.
-
- Perhaps even interrupts and context switches can avoid explicit trap
- barriers! Outstanding operations will just continue to execute during
- the initial interrupt processing, and if/when registers need to be
- saved to memory, the normal interlocks will stall the pipeline until
- they complete --- when everything is saved, the trap bits can be read
- from a special register. This needs some extra support; the most fancy
- would be separate user and kernel mode trap bits, with a flag on each
- operation in the pipeline that shows which to update, but just a
- control bit to inhibit traps would be enough.
- - - - - - - - - -
- The main problem seems to be with subroutine calls. To move an
- instruction across a call, the compiler has to be able to determine
- whether the result will be used. There are a number of cases where
- that is possible; when it is not, we are no worse off than before.
-
- If it is certain that a potentially trapped register value will be
- used later, a subroutine can just be allowed to take a trap if it
- tries to save it. (On such a trap, the debugger must unwind the call
- stack to find the source instruction; but the compiler can easily
- construct tables to allow this.) Software convention may define some
- registers as more likely to be the destinations of slow instructions,
- so that a subroutine can avoid stalls by saving them late.
-
- On the other hand, if the value is known to be dead, the calling code
- can move some dummy value into the register; this will reset the trap
- bit, and write-write interlock will prevent outstanding operations
- from setting it later. Again, software convention may define scratch
- registers that do not have to be detoxed (the subroutine promises to
- write them before reading them).
-
- Lars Mathiesen (U of Copenhagen CS Dep) <thorinn@diku.dk> (Humour NOT marked)
-