NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / arch / 9317 < prev next >

Wrap

Text File | 1992-09-09 | 3.0 KB | 61 lines

Newsgroups: comp.arch Path: sparky!uunet!metaflow!rschnapp From: rschnapp@metaflow.com (Russ Schnapp) Subject: Re: No Branch Delay Slot(s)... Message-ID: <BuC7Hz.HzB@metaflow.com> Sender: usenet@metaflow.com Nntp-Posting-Host: habu Organization: Metaflow Technologies Inc. References: <1992Sep9.044231.12217@fcom.cc.utah.edu> Date: Thu, 10 Sep 1992 00:46:47 GMT Lines: 48 In article <1992Sep9.044231.12217@fcom.cc.utah.edu>, phil@news.ccutah.edu (Phillip Neiswanger) writes: |> If I remember correctly, the article states that the use of |> delayed branch slots could introduce incompatibilities from implementation |> to implementation. This does not seem very intuitive to me. Would anybody |> care to discuss how branch delay slots are going to affect future generation |> of RISC cpus as they enter the era of multiple(read >2) instruction issue |> implementations. Delay slots came about because scalar processors with neither branch prediction nor branch target buffers are unable to issue the branch target in the clock following issue of the branch instruction. In an attempt to make use of at least one clock of this bubble, you delay the execution of the branch, and issue the instruction following it. The trouble is, if you take a look at typical MIPS or SPARC code, the delay slot is often filled with a NOP, or with a duplicated instruction. In general, delay slots artificially constrain code generators (human or otherwise) and tend to dilute the code (i.e., with duplicate instructions or NOPs). Bigger code means more instruction cache misses, more text page faults, and more text page TLB misses. In a superscalar machine of order 2 (i.e., it can issue 2 instructions per clock), a single delay slot can be useful. Still, you have to live with the above-named deficiencies. Besides, what are you going to issue in the next clock, Sherlock? When you get into more aggressive superscalar architectures (such as Metaflow's order-4 machine), a single delay slot instruction doesn't help you very much. I suppose you might want *more* than 1 delay slot, but how many do you want to burden the instruction set with? In any event, delay slots were an elegant solution in the days when pure, simple RISC architectures were the best solution to the performance problem. These days, many folks are waking up to the realization that merely boosting the clock rate and the cache sizes will not get you where you want to go. You need superscalar (or superpipelining, for those of that bent). Then, to really make superscalar architectures work effectively, you need branch prediction, speculative execution, register renaming and out-of-order execution. (If you're interested in these concepts, go read Metaflow's article in the June '91 issue of IEEE Micro.) -- ...Russ Schnapp BIX: rschnapp Email: uunet!metaflow!rschnapp or rschnapp@metaflow.com Metaflow Technologies Voice: 619/452-6608x230; FAX: 619/452-0401 La Jolla, California Unless otw specified, I`m speaking only for myself!