home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cs.utexas.edu!sun-barr!news2me.ebay.sun.com!exodus.Eng.Sun.COM!rbbb.Eng.Sun.COM!chased
- From: chased@rbbb.Eng.Sun.COM (David Chase)
- Newsgroups: comp.arch
- Subject: Re: trapping speculative ops
- Date: 27 Aug 1992 18:06:56 GMT
- Organization: Sun Microsystems, Mt. View, Ca.
- Lines: 60
- Message-ID: <l9q6e0INN919@exodus.Eng.Sun.COM>
- References: <GLEW.92Aug25180333@pdx007.intel.com> <CLIFFC.92Aug26084159@medea.rice.edu> <1992Aug27.005340.6547@bcars64a.bnr.ca> <CLIFFC.92Aug27082745@medea.rice.edu>
- NNTP-Posting-Host: rbbb
-
- I'm a little bit mystified by the expensive approaches described here.
-
- First, speculative division is probably not an interesting case.
- There are vastly more loads and stores than divisions.
-
- However, having a "division" operator that traps is equally
- ridiculous. On some machines (those that implement WORD/WORD -> WORD)
- division, instead of DWORD/WORD -> WORD) the check for overflow (if
- you care at all) is pretty simple -- division by zero for the unsigned
- case, and (for signed) division by zero and division of MININT (i.e.,
- 0x8000000) by -1. That's it. There are some moderately entertaining
- code sequences of the superoptimizer variety for detecting these
- things, when you care, and you can overlap the detection with the
- division if they are allowed to proceed asynchronously.
-
- As far as loads and stores go, I am similarly mystified by the
- insistence that each and every trap from the source program be
- preserved in "very-optimized" code (not code compiled for debugging,
- or at a "normal" level of optimization). For instance, (as has been
- noted by several people before me in more severely refereed forums),
- if page zero is mapped readable, then you can transform
-
- if (px != 0) x = *px;
-
- into
-
- x' = *px;
- /* insert other stuff here, perhaps, to cover the latency. */
- if (px != 0) x = x';
-
- No hardware support is required, but some other traps might be lost.
-
- If the OS could be convinced to just not report illegal loads
- (emulating them as returning zero or NaN, for instance) then loads
- could be hoisted much more frequently. This doesn't work quite
- perfectly -- if you store a tag in the last two bits to distinguish
- integers from pointers, you can lose big, because all the loads will
- trap to the OS -- however, it's also feasible to profile your
- failures, and not speculate on those loads in the next compilation.
- Or, your hardware (in the style of the IBM PC/RT) can silently
- truncate away those bits and not trap anyway.
-
- Note that the hardware support for what I described above is pretty
- damn cheap. Note that no changes are needed to the architecture. The
- only cost incurred is the slightly reduced locality of reference; all
- those extra loads may result in extra paging. (Someone should study
- this, I think. I'm a little worried about binary search trees. If
- someone is really interested, send me mail.)
-
- Remember, we've got tools to help us debug our code. There's Purify,
- and Centerline (formerly Sabre C), and something that I saw mentioned
- in one of the GNU lists that I wish I had made a better note of
- (anyone able to send me a reference, I'd be quite grateful). People
- should use these tools anyway, because there's a whole world of bugs
- *not* caught by the hardware and OS that slips by otherwise (no saint
- like a converted sinner, if you know what I mean, and I think that you
- do).
-
- David Chase
- Sun
-