home *** CD-ROM | disk | FTP | other *** search
/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / arch / 9339 < prev    next >
Encoding:
Internet Message Format  |  1992-09-10  |  15.7 KB

  1. Path: sparky!uunet!ogicse!orstcs!prism!jacobsd
  2. From: jacobsd@prism.cs.orst.edu (Dana Jacobsen)
  3. Newsgroups: comp.arch
  4. Subject: Re: CPU and speed question
  5. Keywords: CPU Intel 88110 Alpha latencies Sparc TI SuperSparc for-all-the-fish
  6. Message-ID: <1992Sep10.180319.11738@CS.ORST.EDU>
  7. Date: 10 Sep 92 18:03:19 GMT
  8. Article-I.D.: CS.1992Sep10.180319.11738
  9. References: <4034@keele.keele.ac.uk> <BuBMC0.K3w@pix.com>
  10. Sender: usenet@CS.ORST.EDU
  11. Organization: Oregon State University, Computer Science Dept
  12. Lines: 309
  13. Nntp-Posting-Host: prism.cs.orst.edu
  14.  
  15. In <BuBMC0.K3w@pix.com> stripes@pix.com (Josh Osborne) writes:
  16. >[I think this is Dana Jacobsen jacobsd@solar.cor2.epa.gov, but can't tell from
  17. >the prev message]
  18.  
  19.  (yes, it's me -- I mailed the article with permission to summarize, and he
  20. posted it right to the news group)
  21.  
  22. >>Operation      Moto 88110    TI SuperSPARC     DEC  Alpha
  23. >>
  24. >>Int add/sub        1/    1        1/   1          1/1-2
  25. >>Int mul            1/    3        4/   4      19-23/19-23
  26. >>Int div           18/   18       18/  18          ---
  27. >>FP add/sub         1/    3        1/   1          1/ 6
  28. >>FP mul             1/    3        1/   3          1/ 6
  29. >>FP div         13-26/23-26    6- 9/ 6- 9      30-63/30-63
  30. >>FP sqrt            ???        7-12/ 8-12          ???
  31.  
  32. >Am I right in assuming that 1/3 means "1 outstanding (max) 3 cycles to
  33. >complete", or is it something else?
  34.  
  35.   It's 1 cycle to issue, 3 to complete.  So you can issue a couple more 1
  36. issue latency instructions in there before you get the result.  Of course
  37. this puts more demand on the compilers...
  38.   Looks like DEC would like to see everyone doing benchmarks using all
  39. integer adds, while Sun would emphasize integer/fp divides.  If Sun comes
  40. through, I'll be able to test a Sparc 10 next week, so we'll see how it
  41. does on real applications (and toy benchmarks of course!).
  42.  
  43.   This is the article I got my information from:
  44.  
  45. > From comp.arch Fri Jun 12 17:51:07 1992
  46. > Path: orstcs!rutgers!uwm.edu!ux1.cso.uiuc.edu!sdd.hp.com!swrinde!gatech!hubcap!mark
  47. > From: mark@hubcap.clemson.edu (Mark Smotherman)
  48. > Newsgroups: comp.arch
  49. > Subject: feature comparison of superscalars: M88110, superSPARC, DEC Alpha
  50. > Message-ID: <1992Jun12.052115.217@hubcap.clemson.edu>
  51. > Date: 12 Jun 92 05:21:15 GMT
  52. > Organization: Clemson University
  53. > Lines: 268
  54. > Some students and I have tried to feature-compare the recent superscalar
  55. > entries, and I thought I would ask for your comments and corrections.
  56. > The 88110 comes across as the cleanest design.
  57. >                     |   
  58. >                     |   Motorola MC88110
  59. >                     |   
  60. > --------------------+--------------------------------------------------------
  61. >                     |
  62. > Hardware design:    |   Single-chip design, 1.5M transistors
  63. >                     |
  64. > Inst. fetch:        |
  65. >   I-cache           |   8KB, 32-byte line size, 2-way set associative,
  66. >                     |     physically addressed, pseudo-random replacement
  67. >   Fetch width       |   2 instructions
  68. >   Fetch alignment   |   Not required
  69. >   Line crossing     |   Not allowed
  70. >   Decoder width     |   2 instructions
  71. >                     |
  72. > Inst. issue:        |
  73. >   Max number issued |   Up to two instructions can be issued; no position
  74. >     per cycle       |     restrictions on issue ("symmetric" issue)
  75. >   Window type       |   Reservation stations for branches and stores
  76. >   Execution order   |   Program-order issue; out-of-order completion
  77. >                     |
  78. > Branch prediction:  |
  79. >   Type              |   Static branch prediction based on compiler hint
  80. >                     |     given in opcode
  81. >   Hardware support  |   32-entry Branch Target Instruction Cache with two
  82. >                     |     target instructions per entry (FIFO replacement)
  83. >                     |   Branch instruction reservation station
  84. >   Recovery method   |   Instructions issued past branch are tagged as
  85. >                     |     conditional and flushed if branch mipredicted;
  86. >                     |     register files restored using history buffer
  87. >                     |
  88. > Functional units:   |
  89. >   Number and type   |   1 instruction / branch unit
  90. >                     |   1 data cache unit
  91. >                     |   2 integer units (32-bit operands)
  92. >                     |   1 bit-field unit (32-bit operands)
  93. >                     |   1 floating-point add unit (80-bit fp operands)
  94. >                     |   1 multiply unit (64-bit int., 80-bit fp)
  95. >                     |   1 divide unit (64/80-bit operands)
  96. >                     |   2 graphics units (64-bit operands)
  97. >   Latencies         |
  98. >     Integer add/sub |   Issue =  1     Result =  1
  99. >     Integer mul     |   Issue =  1     Result =  3
  100. >     Integer div     |   Issue = 18     Result = 18
  101. >     FP add/sub      |   Issue =  1     Result =  3 (FCMP = 1)
  102. >     FP mul          |   Issue =  1     Result =  3
  103. >     FP div          |   Issue = 13-26  Result = 13-26
  104. >                     |
  105. > Registers:          |
  106. >   Integer           |   32 32-bit registers (88100 code uses these for FP)
  107. >   Floating-point    |   32 80-bit registers
  108. >   Rename/scoreboard |   scoreboard
  109. >   Ports             |   6 read / 2 write on each register file
  110. >                     |
  111. > Load/store handling:|   
  112. >   Load use penalty  |   One cycle
  113. >   Load bypass       |   Yes
  114. >   Load forwarding   |   No
  115. >   Hardware support  |   4-entry load queue, 3-entry store instruction
  116. >                     |     reservation station
  117. >                     |   Tagged (conditional) load/stores cannot change cache
  118. >                     |
  119. > Data cache:         |   8KB, 32-byte line size, 2-way set associative,
  120. >                     |     physically addressed, write-through or write-
  121. >                     |     back with write-allocate on page or block basis,
  122. >                     |     pseudo-random replacement, non-blocking
  123. >                     |   Prefetch instructions available as well as non-
  124. >                     |     allocating store-through instructions
  125. >                     |
  126. > Bus:                |   64-bit, split transaction, burst transfers of two
  127. >                     |     words per cycle, critical-word-first with wrap-
  128. >                     |     around and streaming
  129. >                     |
  130. > Exception handling: |   Precise exceptions occur in program order by
  131. >                     |     allowing all prior instructions to complete;
  132. >                     |     register files restored using history buffer
  133. >                     |
  134. > Interrupt handling: |   Precise interrupts with minimum interrupt latency
  135. >                     |     by aborting all incomplete instructions and
  136. >                     |     restoring register files for out-of-order
  137. >                     |     completions using history buffer
  138. >                     |
  139. > Noteworthy features:|   Rich set of execution units
  140. >                     |   Unencumbered issue rules
  141. >                     |   Speculative execution past branches with history
  142. >                     |     buffer used for recovery
  143. >                     |   Sophisticated load/store unit
  144. >                     |   Graphics unit
  145. >                     |   
  146. >                     |   SUN/TI SuperSPARC (Viking)
  147. >                     |   
  148. > --------------------+--------------------------------------------------------
  149. >                     |   
  150. > Hardware design:    |   Single-chip design, 3.1M transistors
  151. >                     |   
  152. > Inst. fetch:        |   
  153. >   I-cache           |   20KB, 8-byte line size, 5-way set associative,
  154. >                     |     physically addressed, pseudo-LRU replacement
  155. >   Fetch width       |   4 instructions
  156. >   Fetch alignment   |   Required for fetch into instruction buffers; grouper
  157. >                     |     provides decoder with 3 instructions from buffer
  158. >   Line crossing     |   Not allowed for fetch into instruction buffers; no
  159. >                     |     impact on grouper
  160. >   Decoder width     |   3 instructions
  161. >                     |   
  162. > Inst. issue:        |   
  163. >   Max number issued |   Up to 3 instructions can be issued per cycle,
  164. >     per cycle       |     governed by an extensive list of grouping rules.
  165. >                     |     Maximums per cycle: two integer operations, one
  166. >                     |     load/store, one shift, one FP/IMUL/IDIV, and one
  167. >                     |     control flow (which must be in last position of
  168. >                     |     issue group).  Issue rules were tailored to
  169. >                     |     existing SPARC code and allow simultaneous issue
  170. >                     |     of: chained integer ALU ops, CCset with dependent
  171. >                     |     branch, load with dependent FP op, ALU op with
  172. >                     |     dependent store.
  173. >   Window type       |   None
  174. >   Execution order   |   Program-order issue; out-of-order completion
  175. >                     |   
  176. > Branch prediction:  |   
  177. >   Type              |   Static predict-not-taken (provides delay slot
  178. >                     |     instruction)
  179. >   Hardware support  |   8-entry sequential path instruction buffer, 4-entry
  180. >                     |     target path instruction buffer (prefetched upon
  181. >                     |     recognizing branch)
  182. >   Recovery method   |   Mispredicted instructions nullified on cycle after
  183. >                     |     issue; register files uncorrupted
  184. >                     |   
  185. > Functional units:   |   
  186. >   Number and type   |   1 resource allocation and forwarding control unit
  187. >                     |     (handles branching)
  188. >                     |   1 integer unit with three cascaded ALUs (also
  189. >                     |     handles load/store)
  190. >                     |   1 floating-point unit (also does IMUL, IDIV),
  191. >                     |     contains 4-entry SPARC FP instruction queue
  192. >   Latencies         |   
  193. >     Integer add/sub |   Issue =  1     Result =  1
  194. >     Integer mul     |   Issue =  4     Result =  4
  195. >     Integer div     |   Issue = 18     Result = 18
  196. >     FP add/sub      |   Issue =  1     Result =  3
  197. >     FP mul          |   Issue =  1     Result =  3
  198. >     FP div          |   Issue = 6-9    Result = 6-9
  199. >     FP sqrt         |   Issue = 8-12   Result = 8-12
  200. >                     |   
  201. > Registers:          |   
  202. >   Integer           |   32 32-bit windowed registers
  203. >   Floating-point    |   32 32-bit registers
  204. >   Rename/scoreboard |   scoreboard
  205. >   Ports             |   Integer register file has 4 ports, double access
  206. >                     |     per cycle; floating point register file has
  207. >                     |     5 ports
  208. >                     |   
  209. > Load/store handling:|   
  210. >   Load use penalty  |   0 cycles (even for 8-byte load)
  211. >   Load bypass       |   Yes
  212. >   Load forwarding   |   No
  213. >   Hardware support  |   8-byte store buffer, also used for D-cache write back
  214. >                     |   
  215. > Data cache:         |   16KB, 4-byte line size, 4-way set associative,
  216. >                     |     8-byte read/write path, physically addressed,
  217. >                     |     write-back with write-allocate, pseudo-LRU
  218. >                     |     replacement
  219. >                     |   
  220. > Bus:                |   64-bit, split transaction, critical-word-first
  221. >                     |   
  222. > Exception handling: |   Precise integer exceptions occur in program order;
  223. >                     |     writeback turned off for instruction causing
  224. >                     |     exception and remains off as pipeline drains;
  225. >                     |     instructions are paired with program counter value
  226. >                     |   Standard SPARC deferred FP exception model
  227. >                     |   
  228. > Interrupt handling: |   ?
  229. >                     |   
  230. > Noteworthy features:|   Large on-chip caches
  231. >                     |   Cascaded integer ALUs allow simultaneous issue of
  232. >                     |     dependent integer operations for many cases
  233. >                     |   Several other dependent pairs can be issued together:
  234. >                     |     operation and dependent branch, FP operation and
  235. >                     |     dependent store, load and dependent FP opertaion
  236. >                     |   No load use penalty
  237. >                     |   
  238. >                     |   DEC Alpha 21064
  239. >                     |   
  240. > --------------------+--------------------------------------------------------
  241. >                     |   
  242. > Hardware design:    |   Single-chip design, 1.7M transistors
  243. >                     |   
  244. > Inst. fetch:        |   
  245. >   I-cache           |   8KB, 32-byte line size, direct-mapped, physically
  246. >                     |     addressed
  247. >   Fetch width       |   2 instructions
  248. >   Fetch alignment   |   Required
  249. >   Line crossing     |   Not allowed
  250. >   Decoder width     |   2 instructions
  251. >                     |   
  252. > Inst. issue:        |   
  253. >   Max number issued |   Up to two instructions can be issued with no
  254. >     per cycle       |     position dependence (second cycle swaps positions
  255. >                     |     if necessary); list of conditions required for
  256. >                     |     dual issue is rather complicated; if second
  257. >                     |     instruction of pair cannot be issued, then next
  258. >                     |     cycle will consider that instruction only
  259. >   Window type       |   None
  260. >   Execution order   |   Program-order issue; out-of-order completion
  261. >                     |   
  262. > Branch prediction:  |   
  263. >   Type              |   Dynamic prediction using one-bit history; otherwise
  264. >                     |     sign of displacement is basis of prediction
  265. >   Hardware support  |   Branch history bit for each instruction location
  266. >                     |     in I-cache
  267. >                     |   4-entry subroutine return address prediction stack
  268. >   Recovery method   |   nullify instructions
  269. >                     |   
  270. > Functional units:   |   
  271. >   Number and type   |   1 instruction sequencing and branch unit
  272. >                     |   1 integer unit
  273. >                     |   1 floating-point unit
  274. >                     |   1 data memory and address generation unit
  275. >   Latencies         |
  276. >     Integer add/sub |   Issue =  1     Result =  1-2
  277. >     Integer mul     |   Issue = 19-23  Result = 19-23
  278. >     Integer div     |   (no IDIV)
  279. >     FP add/sub      |   Issue =  1     Result =  6
  280. >     FP mul          |   Issue =  1     Result =  6
  281. >     FP div          |   Issue = 30-63  Result = 30-63
  282. >                     |   
  283. > Registers:          |   
  284. >   Integer           |   32 64-bit registers
  285. >   Floating-point    |   32 64-bit registers
  286. >   Rename/scoreboard |   scoreboard
  287. >   Ports             |   4 read / 2 write on each register file
  288. >                     |   
  289. > Load/store handling:|   (load/stores only on 64-bit values)
  290. >   Load use penalty  |   2 cycles
  291. >   Load bypass       |   Yes
  292. >   Load forwarding   |   No
  293. >   Hardware support  |   4 32-byte write buffers (writes not necessarily
  294. >                     |     in program order unless memory barrier
  295. >                     |     instructions are used)
  296. >                     |   
  297. > Data cache:         |   8KB, 32-byte line size, direct-mapped, write-
  298. >                     |     through, no-write-allocate, 64-bit read/write
  299. >                     |     path, non-blocking
  300. >                     |   
  301. > Bus:                |   ?
  302. >                     |   
  303. > Exception handling: |   Drain pipe and invoke PAL handler; use of trap
  304. >                     |     barrier instructions can yield precise exceptions
  305. >                     |   
  306. > Interrupt handling: |   Drain pipe and invoke PAL handler
  307. >                     |   
  308. > Noteworthy features:|   64-bit architecture
  309. >                     |   Conditional move to obviate branch
  310. >                     |   Virtual machine support
  311. >                     |   Privileged Architecture Library (PAL) routines
  312. >                     |     that encapsulate atomic OS actions
  313. > --
  314. > -- 
  315. > Mark Smotherman, CS Dept., Clemson University, Clemson, SC 29634-1906
  316. >   (803) 656-5878,  mark@cs.clemson.edu  or  mark@hubcap.clemson.edu
  317.