home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.arch
- Path: sparky!uunet!cs.utexas.edu!sdd.hp.com!caen!destroyer!ubc-cs!uw-beaver!rice!cliffc
- From: cliffc@rice.edu (Cliff Click)
- Subject: Re: trapping speculative ops
- In-Reply-To: schow@bqneh3.bnr.ca's message of Thu, 27 Aug 1992 00:53:40 GMT
- Message-ID: <CLIFFC.92Aug27082745@medea.rice.edu>
- Sender: news@rice.edu (News)
- Organization: Center for Research on Parallel Computations
- References: <GLEW.92Aug25180333@pdx007.intel.com>
- <CLIFFC.92Aug26084159@medea.rice.edu>
- <1992Aug27.005340.6547@bcars64a.bnr.ca>
- Date: Thu, 27 Aug 1992 14:27:45 GMT
- Lines: 107
-
-
- Since these threads are all related, I've bundled all replies into one post.
-
- --------------------
- <Cliff Click>
- || Let every register have some extra "trap" bits.
- || A read of the register with it's trap bits set, causes the exception.
- || A write to the register sets the trap bits according to the success of
- || the operation.
- ||
- || With this design, exceptions are triggered at the START of some operation,
- || instead of in the middle of it.
-
- <Stanley T.H. Chow>
- | Very nice model, but what to do about saving and restoring registers
- | across subroutine call, interupts, etc.?
-
- <Cliff Click>
- Soft answer:
- The compiler knows that live and possibly dangerous results cannot be
- carried across a subroutine call. Therefore it uses every possibly
- dangerous result before the subroutine call. This means you cannot
- lift that divide-by-zero across the subroutine call.
-
- Hard answer:
- The hardware allows these bits to be read and written using a special
- register. The special register can be "slow" because it's assumed to be
- an infrequent operation. The trap bits are saved and restored along with
- the registers.
-
- Interrupts appear to require the "hard answer", but there is another way
- (albeit possible worse):
-
- Harder Soft answer:
- Only 1 interruptable condition can exist at a time. The compiler forces the
- issue by using the possible dangerous register before it allows another
- operation which can cause a dangerous register. External interrupts are
- disabled while a dangerous register exists. The compiler ensures that no
- register is kept dangerous for longer than the desired interrupt latency.
- Since only 1 interrupt can occur at a time, no trap bits need to be saved
- or restored.
- Scheduling restrictions in this answer are probably worse than implementing
- the hardware solution. I have no idea what you can implement cheaply in
- hardware.
-
- <Cliff Click>
- || Pre-fetch for long-distance memory can be implemented with a simple LOAD.
- || If a page fault is required for the LOAD the fault is delayed until the
- || register is used. If the pre-fetch is speculative, no page fault occurs.
-
- <Stanley T.H. Chow>
- | Presumably, the prefetch will use a different opcode, or do you mean all
- | references should behave like this?
-
- <Cliff Click>
- All references behave this way. No seperate prefetch opcode required.
-
-
- --------------------
- <Homayoon Akhiani>
- | I believe that there is a 3rd option:
- | Hardware has Trap Status and Control register
- | Using the control resgister, Software will disable all traps.
- | So when a exception rises, the hardware will not trap, it will update the
- | "Status Register" and follows the normal execution path.
-
- <Cliff Click>
- Upon re-reading your example, I see that you can test the trap results on a
- *per-register* basis. Thus the only difference between your model and my
- model is that (1) traps are tested explicitly by the software, instead of
- implicitly (by reading the register) and (2) interrupts can be made precise
- by having the hardware trap as soon as they occur (software ENABLEs all traps).
-
- I have no idea on the relative merit between our solutions.
-
-
- --------------------
- <Herman Rubin>
- | x = a/b;
- | if(y<0) x=z;
- | where the division takes long enough that the result of the first statement
- | is produced after the result of the second. Whatever the hardware protocol
- | or the rearrangement of instructions, considerable inefficiency can occur.
- | This means that alternative (2) causes slow execution.
-
- Nice example, Herman. But I'm not sure how (2) (allow compiled code a way to
- postpone consequences) causes slow execution. In a precise-interrupt world
- the result of "a/b" needs to be checked for faulting sometime before the
- compiler cannot figure out who faulted. The "usual" solution is that "x=z"
- will block until "a/b" completes or faults. Postponing consequences does not
- slow down or speed up this code, relative to what happens now because the
- hardware has to obey the output dependence.
-
- If you don't care about the possible fault when "y<0", a smarter compiler
- might make: "x = (y<0) ? z : a/b;" which skips the slow division some of
- the time. Also, scheduling, software pipelining and slew of other
- transformations can ameorate the division latency here.
-
- ----
- Homayoon Akhiani akhiani@ricks.enet.dec.com
- Stanley Chow schow@BNR.CA
- Cliff Click cliffc@cs.rice.edu
- Herman Rubin hrubin@pop.stat.purdue.edu (Internet, bitnet)
- --
- The Sparc ABI had the most brain-damaged calling convention I've ever seen.
- It's probably better now but reminiscing gives me something to complain about.
- Cliff Click (cliffc@cs.rice.edu) | Disclaimer: My lawyer made me say it.
-