home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.arch
- Path: sparky!uunet!metaflow!rschnapp
- From: rschnapp@metaflow.com (Russ Schnapp)
- Subject: Re: No Branch Delay Slot(s)...
- Message-ID: <BuC7Hz.HzB@metaflow.com>
- Sender: usenet@metaflow.com
- Nntp-Posting-Host: habu
- Organization: Metaflow Technologies Inc.
- References: <1992Sep9.044231.12217@fcom.cc.utah.edu>
- Date: Thu, 10 Sep 1992 00:46:47 GMT
- Lines: 48
-
- In article <1992Sep9.044231.12217@fcom.cc.utah.edu>, phil@news.ccutah.edu (Phillip Neiswanger) writes:
- |> If I remember correctly, the article states that the use of
- |> delayed branch slots could introduce incompatibilities from implementation
- |> to implementation. This does not seem very intuitive to me. Would anybody
- |> care to discuss how branch delay slots are going to affect future generation
- |> of RISC cpus as they enter the era of multiple(read >2) instruction issue
- |> implementations.
-
- Delay slots came about because scalar processors with neither branch
- prediction nor branch target buffers are unable to issue the branch
- target in the clock following issue of the branch instruction. In an
- attempt to make use of at least one clock of this bubble, you delay the
- execution of the branch, and issue the instruction following it.
-
- The trouble is, if you take a look at typical MIPS or SPARC code, the
- delay slot is often filled with a NOP, or with a duplicated
- instruction. In general, delay slots artificially constrain code
- generators (human or otherwise) and tend to dilute the code (i.e., with
- duplicate instructions or NOPs). Bigger code means more instruction
- cache misses, more text page faults, and more text page TLB misses.
-
- In a superscalar machine of order 2 (i.e., it can issue 2 instructions
- per clock), a single delay slot can be useful. Still, you have to live
- with the above-named deficiencies. Besides, what are you going to
- issue in the next clock, Sherlock?
-
- When you get into more aggressive superscalar architectures (such as
- Metaflow's order-4 machine), a single delay slot instruction doesn't
- help you very much. I suppose you might want *more* than 1 delay slot,
- but how many do you want to burden the instruction set with?
-
- In any event, delay slots were an elegant solution in the days when
- pure, simple RISC architectures were the best solution to the
- performance problem. These days, many folks are waking up to the
- realization that merely boosting the clock rate and the cache sizes
- will not get you where you want to go. You need superscalar (or
- superpipelining, for those of that bent). Then, to really make
- superscalar architectures work effectively, you need branch prediction,
- speculative execution, register renaming and out-of-order execution.
-
- (If you're interested in these concepts, go read Metaflow's article in
- the June '91 issue of IEEE Micro.)
- --
-
- ...Russ Schnapp
- BIX: rschnapp Email: uunet!metaflow!rschnapp or rschnapp@metaflow.com
- Metaflow Technologies Voice: 619/452-6608x230; FAX: 619/452-0401
- La Jolla, California Unless otw specified, I`m speaking only for myself!
-