NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / sys / sgi / 18554 < prev next >

Wrap

Text File | 1993-01-07 | 4.3 KB | 89 lines

Newsgroups: comp.sys.sgi Path: sparky!uunet!zaphod.mps.ohio-state.edu!darwin.sura.net!news.udel.edu!perelandra.cms.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Subject: Re: Indigo R4k/IBM 6k comparison Message-ID: <C0HEGt.4wv@news.udel.edu> Sender: usenet@news.udel.edu Nntp-Posting-Host: perelandra.cms.udel.edu Organization: College of Marine Studies, U. Del. References: <8378@news.duke.edu> <C0GMqw.6LF@news.cso.uiuc.edu> Date: Thu, 7 Jan 1993 11:26:53 GMT Lines: 76 In article <C0GMqw.6LF@news.cso.uiuc.edu> ercolessi@uimrl3.mrl.uiuc.edu (furio ercolessi) writes: >In article <8378@news.duke.edu>, rla@canctr.mc.duke.edu writes: >|>The two remaining candidates are SGI (R4000-based) vs. IBM RS6000s. >[details deleted] >Apparently, one of the key issues is the cache architecture: >size and bandwidth. The first code described above, doing linear >algebra calculations, is very local in memory and uses a large >and fast cache memory at its best. In contrast, the second code >is highly non-local (it has lots of indirect addressing and jumps >in memory like hell), and probably gives a poor Megaflops rating >in all cases. Clearly, different design decisions have been made >by the various teams. HP and SGI seem to behave quite similarly. The LINPACK benchmarks do indeed use the cache well on all the machines. This allows IBM's faster FPU (2 FP OPs/cycle vs about 0.4 FP OPs/cycle on the MIPS chip) to stay busy. However, one does not need to re-use the data in cache to get good performance -- just using everything that is fetched (i.e. use strides of 1 through memory) is enough to get the IBM to outperform the other vendors, although it will not reach the MFLOPS levels of the LINPACK 1000x1000 case. This is consistent with the quoted results on the relative speeds for the second set of benchmarks mentioned, which move through memory randomly. Here is a partial set of results for a finite difference ocean model that uses exclusively unit strides through memory, but is definitely not cacheable. The full set of results is available by anonymous ftp from perelandra.cms.udel.edu in bench/qgbox/. Note that the obsolete IBM RS/6000-320 is roughly the same speed as the R4000 machines. The new 580/980 IBMs should be roughly 3 times as fast as current R4000's. Machine Time (1) MFLOPS Date Notes SPECfp92 MFLOPS/ seconds SPECfp92 ------------------------------------------------------------------------------ Cray C90, 1 cpu 17.8 396.4 92.12.09 (2,16) - - Cray Y/MP, 1 cpu 42.7 165.5 92.12.09 (2) - - IBM RS/6000-950 303 23.6 92.12.04 (12) 81.8 0.289 DEC 3000-500 AXP 350 20.4 92.12.08 (10) 126.0 0.182 IBM RS/6000-530H 388 18.4 92.12.04 (4,11) 57.7 0.319 HP-9000/730 590 12.1 92.12.09 (15) 75.0 0.161 SGI R4000 Indigo 731 9.8 92.12.04 (5) ~61 0.160 SGI Crimson 817 8.7 92.12.03 (3) 63.4 0.137 IBM RS/6000-320 828 8.6 92.09.23 (14) ~31.5 0.273 HP-9000/720 872 8.2 92.12.04 (4) 58.2 0.148 HP-9000/710 1041 6.9 92.12.05 (4) 31.6 0.218 SparcClassic 1046 6.8 92.12.07 (6) 21.0 0.323 SparcStation2 2062 3.5 92.12.07 (7) 21.8 0.161 SGI 4D/25 4691 1.5 92.09.23 ~10. 0.150 ------------------------------------------------------------------------------ (1) The above timings are user times only. (2) Inline tridsolve() by compiler directive. Used different versions of advect2() and delsq5(). (3) charlie@kestrel.tamu.edu, f77 -O -mips2 (4) olin@cheme.tn.cornell.edu, f77 -O (5) fisher@ivy.dt.navy.mil, f77 -O2 -mips2 (6) daniel.nelsen@Eng.Sun.COM, SC2.0.1-f77 -dalign -cg92 -O4 (7) daniel.nelsen@Eng.Sun.COM, SC2.0.1-f77 -dalign -cg89 -O4 (10) ecf_stbo@jhuvms.hcf.jhu.edu, VMS AXP V1.0; Dec fortran V6.0., (specifically "EV6.0-289-24AG") F77/optimize No writes to fort.11 or fort.12 (11) The SPECfp92 number here is out-of-date. (12) jbs@watson.ibm.com, xlf 2.3. (14) mccalpin@perelandra.cms.udel.edu, xlf 1.2 (15) bt@irfu.se, FFLAGS = +OP2 +OS +O3 -Wl,-a,archive -WP,-cachesize=256, -arclimit=5000,-aggressive=a,-vectorize,-optimize=4 (16) cmg@magnet.cray.com, used qgbox.cray.f, inlined trid_solve -- -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. John.McCalpin@mvs.udel.edu