home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!ogicse!orstcs!prism!jacobsd
- From: jacobsd@prism.cs.orst.edu (Dana Jacobsen)
- Newsgroups: comp.arch
- Subject: Re: CPU and speed question
- Keywords: CPU Intel 88110 Alpha latencies Sparc TI SuperSparc for-all-the-fish
- Message-ID: <1992Sep10.180319.11738@CS.ORST.EDU>
- Date: 10 Sep 92 18:03:19 GMT
- Article-I.D.: CS.1992Sep10.180319.11738
- References: <4034@keele.keele.ac.uk> <BuBMC0.K3w@pix.com>
- Sender: usenet@CS.ORST.EDU
- Organization: Oregon State University, Computer Science Dept
- Lines: 309
- Nntp-Posting-Host: prism.cs.orst.edu
-
- In <BuBMC0.K3w@pix.com> stripes@pix.com (Josh Osborne) writes:
- >[I think this is Dana Jacobsen jacobsd@solar.cor2.epa.gov, but can't tell from
- >the prev message]
-
- (yes, it's me -- I mailed the article with permission to summarize, and he
- posted it right to the news group)
-
- >>Operation Moto 88110 TI SuperSPARC DEC Alpha
- >>
- >>Int add/sub 1/ 1 1/ 1 1/1-2
- >>Int mul 1/ 3 4/ 4 19-23/19-23
- >>Int div 18/ 18 18/ 18 ---
- >>FP add/sub 1/ 3 1/ 1 1/ 6
- >>FP mul 1/ 3 1/ 3 1/ 6
- >>FP div 13-26/23-26 6- 9/ 6- 9 30-63/30-63
- >>FP sqrt ??? 7-12/ 8-12 ???
-
- >Am I right in assuming that 1/3 means "1 outstanding (max) 3 cycles to
- >complete", or is it something else?
-
- It's 1 cycle to issue, 3 to complete. So you can issue a couple more 1
- issue latency instructions in there before you get the result. Of course
- this puts more demand on the compilers...
- Looks like DEC would like to see everyone doing benchmarks using all
- integer adds, while Sun would emphasize integer/fp divides. If Sun comes
- through, I'll be able to test a Sparc 10 next week, so we'll see how it
- does on real applications (and toy benchmarks of course!).
-
- This is the article I got my information from:
-
- > From comp.arch Fri Jun 12 17:51:07 1992
- > Path: orstcs!rutgers!uwm.edu!ux1.cso.uiuc.edu!sdd.hp.com!swrinde!gatech!hubcap!mark
- > From: mark@hubcap.clemson.edu (Mark Smotherman)
- > Newsgroups: comp.arch
- > Subject: feature comparison of superscalars: M88110, superSPARC, DEC Alpha
- > Message-ID: <1992Jun12.052115.217@hubcap.clemson.edu>
- > Date: 12 Jun 92 05:21:15 GMT
- > Organization: Clemson University
- > Lines: 268
- >
- > Some students and I have tried to feature-compare the recent superscalar
- > entries, and I thought I would ask for your comments and corrections.
- > The 88110 comes across as the cleanest design.
- >
- >
- > |
- > | Motorola MC88110
- > |
- > --------------------+--------------------------------------------------------
- > |
- > Hardware design: | Single-chip design, 1.5M transistors
- > |
- > Inst. fetch: |
- > I-cache | 8KB, 32-byte line size, 2-way set associative,
- > | physically addressed, pseudo-random replacement
- > Fetch width | 2 instructions
- > Fetch alignment | Not required
- > Line crossing | Not allowed
- > Decoder width | 2 instructions
- > |
- > Inst. issue: |
- > Max number issued | Up to two instructions can be issued; no position
- > per cycle | restrictions on issue ("symmetric" issue)
- > Window type | Reservation stations for branches and stores
- > Execution order | Program-order issue; out-of-order completion
- > |
- > Branch prediction: |
- > Type | Static branch prediction based on compiler hint
- > | given in opcode
- > Hardware support | 32-entry Branch Target Instruction Cache with two
- > | target instructions per entry (FIFO replacement)
- > | Branch instruction reservation station
- > Recovery method | Instructions issued past branch are tagged as
- > | conditional and flushed if branch mipredicted;
- > | register files restored using history buffer
- > |
- > Functional units: |
- > Number and type | 1 instruction / branch unit
- > | 1 data cache unit
- > | 2 integer units (32-bit operands)
- > | 1 bit-field unit (32-bit operands)
- > | 1 floating-point add unit (80-bit fp operands)
- > | 1 multiply unit (64-bit int., 80-bit fp)
- > | 1 divide unit (64/80-bit operands)
- > | 2 graphics units (64-bit operands)
- > Latencies |
- > Integer add/sub | Issue = 1 Result = 1
- > Integer mul | Issue = 1 Result = 3
- > Integer div | Issue = 18 Result = 18
- > FP add/sub | Issue = 1 Result = 3 (FCMP = 1)
- > FP mul | Issue = 1 Result = 3
- > FP div | Issue = 13-26 Result = 13-26
- > |
- > Registers: |
- > Integer | 32 32-bit registers (88100 code uses these for FP)
- > Floating-point | 32 80-bit registers
- > Rename/scoreboard | scoreboard
- > Ports | 6 read / 2 write on each register file
- > |
- > Load/store handling:|
- > Load use penalty | One cycle
- > Load bypass | Yes
- > Load forwarding | No
- > Hardware support | 4-entry load queue, 3-entry store instruction
- > | reservation station
- > | Tagged (conditional) load/stores cannot change cache
- > |
- > Data cache: | 8KB, 32-byte line size, 2-way set associative,
- > | physically addressed, write-through or write-
- > | back with write-allocate on page or block basis,
- > | pseudo-random replacement, non-blocking
- > | Prefetch instructions available as well as non-
- > | allocating store-through instructions
- > |
- > Bus: | 64-bit, split transaction, burst transfers of two
- > | words per cycle, critical-word-first with wrap-
- > | around and streaming
- > |
- > Exception handling: | Precise exceptions occur in program order by
- > | allowing all prior instructions to complete;
- > | register files restored using history buffer
- > |
- > Interrupt handling: | Precise interrupts with minimum interrupt latency
- > | by aborting all incomplete instructions and
- > | restoring register files for out-of-order
- > | completions using history buffer
- > |
- > Noteworthy features:| Rich set of execution units
- > | Unencumbered issue rules
- > | Speculative execution past branches with history
- > | buffer used for recovery
- > | Sophisticated load/store unit
- > | Graphics unit
- >
- > |
- > | SUN/TI SuperSPARC (Viking)
- > |
- > --------------------+--------------------------------------------------------
- > |
- > Hardware design: | Single-chip design, 3.1M transistors
- > |
- > Inst. fetch: |
- > I-cache | 20KB, 8-byte line size, 5-way set associative,
- > | physically addressed, pseudo-LRU replacement
- > Fetch width | 4 instructions
- > Fetch alignment | Required for fetch into instruction buffers; grouper
- > | provides decoder with 3 instructions from buffer
- > Line crossing | Not allowed for fetch into instruction buffers; no
- > | impact on grouper
- > Decoder width | 3 instructions
- > |
- > Inst. issue: |
- > Max number issued | Up to 3 instructions can be issued per cycle,
- > per cycle | governed by an extensive list of grouping rules.
- > | Maximums per cycle: two integer operations, one
- > | load/store, one shift, one FP/IMUL/IDIV, and one
- > | control flow (which must be in last position of
- > | issue group). Issue rules were tailored to
- > | existing SPARC code and allow simultaneous issue
- > | of: chained integer ALU ops, CCset with dependent
- > | branch, load with dependent FP op, ALU op with
- > | dependent store.
- > Window type | None
- > Execution order | Program-order issue; out-of-order completion
- > |
- > Branch prediction: |
- > Type | Static predict-not-taken (provides delay slot
- > | instruction)
- > Hardware support | 8-entry sequential path instruction buffer, 4-entry
- > | target path instruction buffer (prefetched upon
- > | recognizing branch)
- > Recovery method | Mispredicted instructions nullified on cycle after
- > | issue; register files uncorrupted
- > |
- > Functional units: |
- > Number and type | 1 resource allocation and forwarding control unit
- > | (handles branching)
- > | 1 integer unit with three cascaded ALUs (also
- > | handles load/store)
- > | 1 floating-point unit (also does IMUL, IDIV),
- > | contains 4-entry SPARC FP instruction queue
- > Latencies |
- > Integer add/sub | Issue = 1 Result = 1
- > Integer mul | Issue = 4 Result = 4
- > Integer div | Issue = 18 Result = 18
- > FP add/sub | Issue = 1 Result = 3
- > FP mul | Issue = 1 Result = 3
- > FP div | Issue = 6-9 Result = 6-9
- > FP sqrt | Issue = 8-12 Result = 8-12
- > |
- > Registers: |
- > Integer | 32 32-bit windowed registers
- > Floating-point | 32 32-bit registers
- > Rename/scoreboard | scoreboard
- > Ports | Integer register file has 4 ports, double access
- > | per cycle; floating point register file has
- > | 5 ports
- > |
- > Load/store handling:|
- > Load use penalty | 0 cycles (even for 8-byte load)
- > Load bypass | Yes
- > Load forwarding | No
- > Hardware support | 8-byte store buffer, also used for D-cache write back
- > |
- > Data cache: | 16KB, 4-byte line size, 4-way set associative,
- > | 8-byte read/write path, physically addressed,
- > | write-back with write-allocate, pseudo-LRU
- > | replacement
- > |
- > Bus: | 64-bit, split transaction, critical-word-first
- > |
- > Exception handling: | Precise integer exceptions occur in program order;
- > | writeback turned off for instruction causing
- > | exception and remains off as pipeline drains;
- > | instructions are paired with program counter value
- > | Standard SPARC deferred FP exception model
- > |
- > Interrupt handling: | ?
- > |
- > Noteworthy features:| Large on-chip caches
- > | Cascaded integer ALUs allow simultaneous issue of
- > | dependent integer operations for many cases
- > | Several other dependent pairs can be issued together:
- > | operation and dependent branch, FP operation and
- > | dependent store, load and dependent FP opertaion
- > | No load use penalty
- >
- > |
- > | DEC Alpha 21064
- > |
- > --------------------+--------------------------------------------------------
- > |
- > Hardware design: | Single-chip design, 1.7M transistors
- > |
- > Inst. fetch: |
- > I-cache | 8KB, 32-byte line size, direct-mapped, physically
- > | addressed
- > Fetch width | 2 instructions
- > Fetch alignment | Required
- > Line crossing | Not allowed
- > Decoder width | 2 instructions
- > |
- > Inst. issue: |
- > Max number issued | Up to two instructions can be issued with no
- > per cycle | position dependence (second cycle swaps positions
- > | if necessary); list of conditions required for
- > | dual issue is rather complicated; if second
- > | instruction of pair cannot be issued, then next
- > | cycle will consider that instruction only
- > Window type | None
- > Execution order | Program-order issue; out-of-order completion
- > |
- > Branch prediction: |
- > Type | Dynamic prediction using one-bit history; otherwise
- > | sign of displacement is basis of prediction
- > Hardware support | Branch history bit for each instruction location
- > | in I-cache
- > | 4-entry subroutine return address prediction stack
- > Recovery method | nullify instructions
- > |
- > Functional units: |
- > Number and type | 1 instruction sequencing and branch unit
- > | 1 integer unit
- > | 1 floating-point unit
- > | 1 data memory and address generation unit
- > Latencies |
- > Integer add/sub | Issue = 1 Result = 1-2
- > Integer mul | Issue = 19-23 Result = 19-23
- > Integer div | (no IDIV)
- > FP add/sub | Issue = 1 Result = 6
- > FP mul | Issue = 1 Result = 6
- > FP div | Issue = 30-63 Result = 30-63
- > |
- > Registers: |
- > Integer | 32 64-bit registers
- > Floating-point | 32 64-bit registers
- > Rename/scoreboard | scoreboard
- > Ports | 4 read / 2 write on each register file
- > |
- > Load/store handling:| (load/stores only on 64-bit values)
- > Load use penalty | 2 cycles
- > Load bypass | Yes
- > Load forwarding | No
- > Hardware support | 4 32-byte write buffers (writes not necessarily
- > | in program order unless memory barrier
- > | instructions are used)
- > |
- > Data cache: | 8KB, 32-byte line size, direct-mapped, write-
- > | through, no-write-allocate, 64-bit read/write
- > | path, non-blocking
- > |
- > Bus: | ?
- > |
- > Exception handling: | Drain pipe and invoke PAL handler; use of trap
- > | barrier instructions can yield precise exceptions
- > |
- > Interrupt handling: | Drain pipe and invoke PAL handler
- > |
- > Noteworthy features:| 64-bit architecture
- > | Conditional move to obviate branch
- > | Virtual machine support
- > | Privileged Architecture Library (PAL) routines
- > | that encapsulate atomic OS actions
- >
- > --
- > --
- > Mark Smotherman, CS Dept., Clemson University, Clemson, SC 29634-1906
- > (803) 656-5878, mark@cs.clemson.edu or mark@hubcap.clemson.edu
- >
-