NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / arch / 9339 < prev next >

Wrap

Internet Message Format | 1992-09-10 | 15.7 KB

Path: sparky!uunet!ogicse!orstcs!prism!jacobsd From: jacobsd@prism.cs.orst.edu (Dana Jacobsen) Newsgroups: comp.arch Subject: Re: CPU and speed question Keywords: CPU Intel 88110 Alpha latencies Sparc TI SuperSparc for-all-the-fish Message-ID: <1992Sep10.180319.11738@CS.ORST.EDU> Date: 10 Sep 92 18:03:19 GMT Article-I.D.: CS.1992Sep10.180319.11738 References: <4034@keele.keele.ac.uk> <BuBMC0.K3w@pix.com> Sender: usenet@CS.ORST.EDU Organization: Oregon State University, Computer Science Dept Lines: 309 Nntp-Posting-Host: prism.cs.orst.edu In <BuBMC0.K3w@pix.com> stripes@pix.com (Josh Osborne) writes: >[I think this is Dana Jacobsen jacobsd@solar.cor2.epa.gov, but can't tell from >the prev message] (yes, it's me -- I mailed the article with permission to summarize, and he posted it right to the news group) >>Operation Moto 88110 TI SuperSPARC DEC Alpha >> >>Int add/sub 1/ 1 1/ 1 1/1-2 >>Int mul 1/ 3 4/ 4 19-23/19-23 >>Int div 18/ 18 18/ 18 --- >>FP add/sub 1/ 3 1/ 1 1/ 6 >>FP mul 1/ 3 1/ 3 1/ 6 >>FP div 13-26/23-26 6- 9/ 6- 9 30-63/30-63 >>FP sqrt ??? 7-12/ 8-12 ??? >Am I right in assuming that 1/3 means "1 outstanding (max) 3 cycles to >complete", or is it something else? It's 1 cycle to issue, 3 to complete. So you can issue a couple more 1 issue latency instructions in there before you get the result. Of course this puts more demand on the compilers... Looks like DEC would like to see everyone doing benchmarks using all integer adds, while Sun would emphasize integer/fp divides. If Sun comes through, I'll be able to test a Sparc 10 next week, so we'll see how it does on real applications (and toy benchmarks of course!). This is the article I got my information from: > From comp.arch Fri Jun 12 17:51:07 1992 > Path: orstcs!rutgers!uwm.edu!ux1.cso.uiuc.edu!sdd.hp.com!swrinde!gatech!hubcap!mark > From: mark@hubcap.clemson.edu (Mark Smotherman) > Newsgroups: comp.arch > Subject: feature comparison of superscalars: M88110, superSPARC, DEC Alpha > Message-ID: <1992Jun12.052115.217@hubcap.clemson.edu> > Date: 12 Jun 92 05:21:15 GMT > Organization: Clemson University > Lines: 268 > > Some students and I have tried to feature-compare the recent superscalar > entries, and I thought I would ask for your comments and corrections. > The 88110 comes across as the cleanest design. > > > | > | Motorola MC88110 > | > --------------------+-------------------------------------------------------- > | > Hardware design: | Single-chip design, 1.5M transistors > | > Inst. fetch: | > I-cache | 8KB, 32-byte line size, 2-way set associative, > | physically addressed, pseudo-random replacement > Fetch width | 2 instructions > Fetch alignment | Not required > Line crossing | Not allowed > Decoder width | 2 instructions > | > Inst. issue: | > Max number issued | Up to two instructions can be issued; no position > per cycle | restrictions on issue ("symmetric" issue) > Window type | Reservation stations for branches and stores > Execution order | Program-order issue; out-of-order completion > | > Branch prediction: | > Type | Static branch prediction based on compiler hint > | given in opcode > Hardware support | 32-entry Branch Target Instruction Cache with two > | target instructions per entry (FIFO replacement) > | Branch instruction reservation station > Recovery method | Instructions issued past branch are tagged as > | conditional and flushed if branch mipredicted; > | register files restored using history buffer > | > Functional units: | > Number and type | 1 instruction / branch unit > | 1 data cache unit > | 2 integer units (32-bit operands) > | 1 bit-field unit (32-bit operands) > | 1 floating-point add unit (80-bit fp operands) > | 1 multiply unit (64-bit int., 80-bit fp) > | 1 divide unit (64/80-bit operands) > | 2 graphics units (64-bit operands) > Latencies | > Integer add/sub | Issue = 1 Result = 1 > Integer mul | Issue = 1 Result = 3 > Integer div | Issue = 18 Result = 18 > FP add/sub | Issue = 1 Result = 3 (FCMP = 1) > FP mul | Issue = 1 Result = 3 > FP div | Issue = 13-26 Result = 13-26 > | > Registers: | > Integer | 32 32-bit registers (88100 code uses these for FP) > Floating-point | 32 80-bit registers > Rename/scoreboard | scoreboard > Ports | 6 read / 2 write on each register file > | > Load/store handling:| > Load use penalty | One cycle > Load bypass | Yes > Load forwarding | No > Hardware support | 4-entry load queue, 3-entry store instruction > | reservation station > | Tagged (conditional) load/stores cannot change cache > | > Data cache: | 8KB, 32-byte line size, 2-way set associative, > | physically addressed, write-through or write- > | back with write-allocate on page or block basis, > | pseudo-random replacement, non-blocking > | Prefetch instructions available as well as non- > | allocating store-through instructions > | > Bus: | 64-bit, split transaction, burst transfers of two > | words per cycle, critical-word-first with wrap- > | around and streaming > | > Exception handling: | Precise exceptions occur in program order by > | allowing all prior instructions to complete; > | register files restored using history buffer > | > Interrupt handling: | Precise interrupts with minimum interrupt latency > | by aborting all incomplete instructions and > | restoring register files for out-of-order > | completions using history buffer > | > Noteworthy features:| Rich set of execution units > | Unencumbered issue rules > | Speculative execution past branches with history > | buffer used for recovery > | Sophisticated load/store unit > | Graphics unit > > | > | SUN/TI SuperSPARC (Viking) > | > --------------------+-------------------------------------------------------- > | > Hardware design: | Single-chip design, 3.1M transistors > | > Inst. fetch: | > I-cache | 20KB, 8-byte line size, 5-way set associative, > | physically addressed, pseudo-LRU replacement > Fetch width | 4 instructions > Fetch alignment | Required for fetch into instruction buffers; grouper > | provides decoder with 3 instructions from buffer > Line crossing | Not allowed for fetch into instruction buffers; no > | impact on grouper > Decoder width | 3 instructions > | > Inst. issue: | > Max number issued | Up to 3 instructions can be issued per cycle, > per cycle | governed by an extensive list of grouping rules. > | Maximums per cycle: two integer operations, one > | load/store, one shift, one FP/IMUL/IDIV, and one > | control flow (which must be in last position of > | issue group). Issue rules were tailored to > | existing SPARC code and allow simultaneous issue > | of: chained integer ALU ops, CCset with dependent > | branch, load with dependent FP op, ALU op with > | dependent store. > Window type | None > Execution order | Program-order issue; out-of-order completion > | > Branch prediction: | > Type | Static predict-not-taken (provides delay slot > | instruction) > Hardware support | 8-entry sequential path instruction buffer, 4-entry > | target path instruction buffer (prefetched upon > | recognizing branch) > Recovery method | Mispredicted instructions nullified on cycle after > | issue; register files uncorrupted > | > Functional units: | > Number and type | 1 resource allocation and forwarding control unit > | (handles branching) > | 1 integer unit with three cascaded ALUs (also > | handles load/store) > | 1 floating-point unit (also does IMUL, IDIV), > | contains 4-entry SPARC FP instruction queue > Latencies | > Integer add/sub | Issue = 1 Result = 1 > Integer mul | Issue = 4 Result = 4 > Integer div | Issue = 18 Result = 18 > FP add/sub | Issue = 1 Result = 3 > FP mul | Issue = 1 Result = 3 > FP div | Issue = 6-9 Result = 6-9 > FP sqrt | Issue = 8-12 Result = 8-12 > | > Registers: | > Integer | 32 32-bit windowed registers > Floating-point | 32 32-bit registers > Rename/scoreboard | scoreboard > Ports | Integer register file has 4 ports, double access > | per cycle; floating point register file has > | 5 ports > | > Load/store handling:| > Load use penalty | 0 cycles (even for 8-byte load) > Load bypass | Yes > Load forwarding | No > Hardware support | 8-byte store buffer, also used for D-cache write back > | > Data cache: | 16KB, 4-byte line size, 4-way set associative, > | 8-byte read/write path, physically addressed, > | write-back with write-allocate, pseudo-LRU > | replacement > | > Bus: | 64-bit, split transaction, critical-word-first > | > Exception handling: | Precise integer exceptions occur in program order; > | writeback turned off for instruction causing > | exception and remains off as pipeline drains; > | instructions are paired with program counter value > | Standard SPARC deferred FP exception model > | > Interrupt handling: | ? > | > Noteworthy features:| Large on-chip caches > | Cascaded integer ALUs allow simultaneous issue of > | dependent integer operations for many cases > | Several other dependent pairs can be issued together: > | operation and dependent branch, FP operation and > | dependent store, load and dependent FP opertaion > | No load use penalty > > | > | DEC Alpha 21064 > | > --------------------+-------------------------------------------------------- > | > Hardware design: | Single-chip design, 1.7M transistors > | > Inst. fetch: | > I-cache | 8KB, 32-byte line size, direct-mapped, physically > | addressed > Fetch width | 2 instructions > Fetch alignment | Required > Line crossing | Not allowed > Decoder width | 2 instructions > | > Inst. issue: | > Max number issued | Up to two instructions can be issued with no > per cycle | position dependence (second cycle swaps positions > | if necessary); list of conditions required for > | dual issue is rather complicated; if second > | instruction of pair cannot be issued, then next > | cycle will consider that instruction only > Window type | None > Execution order | Program-order issue; out-of-order completion > | > Branch prediction: | > Type | Dynamic prediction using one-bit history; otherwise > | sign of displacement is basis of prediction > Hardware support | Branch history bit for each instruction location > | in I-cache > | 4-entry subroutine return address prediction stack > Recovery method | nullify instructions > | > Functional units: | > Number and type | 1 instruction sequencing and branch unit > | 1 integer unit > | 1 floating-point unit > | 1 data memory and address generation unit > Latencies | > Integer add/sub | Issue = 1 Result = 1-2 > Integer mul | Issue = 19-23 Result = 19-23 > Integer div | (no IDIV) > FP add/sub | Issue = 1 Result = 6 > FP mul | Issue = 1 Result = 6 > FP div | Issue = 30-63 Result = 30-63 > | > Registers: | > Integer | 32 64-bit registers > Floating-point | 32 64-bit registers > Rename/scoreboard | scoreboard > Ports | 4 read / 2 write on each register file > | > Load/store handling:| (load/stores only on 64-bit values) > Load use penalty | 2 cycles > Load bypass | Yes > Load forwarding | No > Hardware support | 4 32-byte write buffers (writes not necessarily > | in program order unless memory barrier > | instructions are used) > | > Data cache: | 8KB, 32-byte line size, direct-mapped, write- > | through, no-write-allocate, 64-bit read/write > | path, non-blocking > | > Bus: | ? > | > Exception handling: | Drain pipe and invoke PAL handler; use of trap > | barrier instructions can yield precise exceptions > | > Interrupt handling: | Drain pipe and invoke PAL handler > | > Noteworthy features:| 64-bit architecture > | Conditional move to obviate branch > | Virtual machine support > | Privileged Architecture Library (PAL) routines > | that encapsulate atomic OS actions > > -- > -- > Mark Smotherman, CS Dept., Clemson University, Clemson, SC 29634-1906 > (803) 656-5878, mark@cs.clemson.edu or mark@hubcap.clemson.edu >