home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.sgi
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!darwin.sura.net!news.udel.edu!perelandra.cms.udel.edu!mccalpin
- From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin)
- Subject: Re: Indigo R4k/IBM 6k comparison
- Message-ID: <C0HEGt.4wv@news.udel.edu>
- Sender: usenet@news.udel.edu
- Nntp-Posting-Host: perelandra.cms.udel.edu
- Organization: College of Marine Studies, U. Del.
- References: <8378@news.duke.edu> <C0GMqw.6LF@news.cso.uiuc.edu>
- Date: Thu, 7 Jan 1993 11:26:53 GMT
- Lines: 76
-
- In article <C0GMqw.6LF@news.cso.uiuc.edu> ercolessi@uimrl3.mrl.uiuc.edu (furio ercolessi) writes:
- >In article <8378@news.duke.edu>, rla@canctr.mc.duke.edu writes:
- >|>The two remaining candidates are SGI (R4000-based) vs. IBM RS6000s.
-
- >[details deleted]
- >Apparently, one of the key issues is the cache architecture:
- >size and bandwidth. The first code described above, doing linear
- >algebra calculations, is very local in memory and uses a large
- >and fast cache memory at its best. In contrast, the second code
- >is highly non-local (it has lots of indirect addressing and jumps
- >in memory like hell), and probably gives a poor Megaflops rating
- >in all cases. Clearly, different design decisions have been made
- >by the various teams. HP and SGI seem to behave quite similarly.
-
- The LINPACK benchmarks do indeed use the cache well on all the
- machines. This allows IBM's faster FPU (2 FP OPs/cycle vs about
- 0.4 FP OPs/cycle on the MIPS chip) to stay busy.
-
- However, one does not need to re-use the data in cache to get good
- performance -- just using everything that is fetched (i.e. use strides
- of 1 through memory) is enough to get the IBM to outperform the other
- vendors, although it will not reach the MFLOPS levels of the LINPACK
- 1000x1000 case. This is consistent with the quoted results on the
- relative speeds for the second set of benchmarks mentioned, which
- move through memory randomly.
-
- Here is a partial set of results for a finite difference ocean model
- that uses exclusively unit strides through memory, but is definitely
- not cacheable. The full set of results is available by anonymous ftp
- from perelandra.cms.udel.edu in bench/qgbox/.
-
- Note that the obsolete IBM RS/6000-320 is roughly the same speed as
- the R4000 machines. The new 580/980 IBMs should be roughly 3 times
- as fast as current R4000's.
-
- Machine Time (1) MFLOPS Date Notes SPECfp92 MFLOPS/
- seconds SPECfp92
- ------------------------------------------------------------------------------
- Cray C90, 1 cpu 17.8 396.4 92.12.09 (2,16) - -
- Cray Y/MP, 1 cpu 42.7 165.5 92.12.09 (2) - -
- IBM RS/6000-950 303 23.6 92.12.04 (12) 81.8 0.289
- DEC 3000-500 AXP 350 20.4 92.12.08 (10) 126.0 0.182
- IBM RS/6000-530H 388 18.4 92.12.04 (4,11) 57.7 0.319
- HP-9000/730 590 12.1 92.12.09 (15) 75.0 0.161
- SGI R4000 Indigo 731 9.8 92.12.04 (5) ~61 0.160
- SGI Crimson 817 8.7 92.12.03 (3) 63.4 0.137
- IBM RS/6000-320 828 8.6 92.09.23 (14) ~31.5 0.273
- HP-9000/720 872 8.2 92.12.04 (4) 58.2 0.148
- HP-9000/710 1041 6.9 92.12.05 (4) 31.6 0.218
- SparcClassic 1046 6.8 92.12.07 (6) 21.0 0.323
- SparcStation2 2062 3.5 92.12.07 (7) 21.8 0.161
- SGI 4D/25 4691 1.5 92.09.23 ~10. 0.150
- ------------------------------------------------------------------------------
-
- (1) The above timings are user times only.
- (2) Inline tridsolve() by compiler directive.
- Used different versions of advect2() and delsq5().
- (3) charlie@kestrel.tamu.edu, f77 -O -mips2
- (4) olin@cheme.tn.cornell.edu, f77 -O
- (5) fisher@ivy.dt.navy.mil, f77 -O2 -mips2
- (6) daniel.nelsen@Eng.Sun.COM, SC2.0.1-f77 -dalign -cg92 -O4
- (7) daniel.nelsen@Eng.Sun.COM, SC2.0.1-f77 -dalign -cg89 -O4
- (10) ecf_stbo@jhuvms.hcf.jhu.edu, VMS AXP V1.0; Dec fortran V6.0.,
- (specifically "EV6.0-289-24AG") F77/optimize
- No writes to fort.11 or fort.12
- (11) The SPECfp92 number here is out-of-date.
- (12) jbs@watson.ibm.com, xlf 2.3.
- (14) mccalpin@perelandra.cms.udel.edu, xlf 1.2
- (15) bt@irfu.se, FFLAGS = +OP2 +OS +O3 -Wl,-a,archive -WP,-cachesize=256,
- -arclimit=5000,-aggressive=a,-vectorize,-optimize=4
- (16) cmg@magnet.cray.com, used qgbox.cray.f, inlined trid_solve
- --
- --
- John D. McCalpin mccalpin@perelandra.cms.udel.edu
- Assistant Professor mccalpin@brahms.udel.edu
- College of Marine Studies, U. Del. John.McCalpin@mvs.udel.edu
-