NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / benchmar / 1183 < prev next >

Wrap

Internet Message Format | 1992-07-23 | 8.8 KB

Path: sparky!uunet!darwin.sura.net!mips!pacbell.com!lll-winken!lll-crg.llnl.gov!mcmahon From: mcmahon@lll-crg.llnl.gov (Frank McMahon) Newsgroups: comp.benchmarks Subject: Proposal To Correct Deficiencies In "Livermore Loops in C" Message-ID: <131316@lll-winken.LLNL.GOV> Date: 24 Jul 92 00:08:34 GMT Sender: usenet@lll-winken.LLNL.GOV Organization: Lawrence Livermore National Laboratory Lines: 211 Nntp-Posting-Host: lll-crg.llnl.gov TO: Computational Benchmark Colleagues FROM: Frank McMahon ( author of Livermore Loops, aka LFK Test) Lawrence Livermore National Laboratory P.O. Box 808, L-35 Livermore, CA. 94550 USA MCMAHON3@LLNL.GOV mcmahon@ocfmail.ocf.llnl.gov SUBJECT: Proposal To Correct Deficiencies in "Livermore Loops in C" DATE: 92.07.17 In 1991 the "Livermore Loops in C" were transliterated from our Fortran version (Copyright 1983 The Regents of the University of California) by M.Fouts. Unfortunately Mr. Fouts transliterated an archaic 1984 version of my test-driver program. In the past eight years I upgraded my LFK test several times to overcome timing errors caused by low resolution (.01 sec.) cpu-timers (e.g. UNIX/ETIME). The obsolete 1984 LFK test program could cause significant timing errors and inflated Mflop ratings on todays fast UNIX workstations which, if not too large, might not arouse suspicion. I have noticed inflated Mflop rates in several of the published C ratings compared to our upgraded Fortran version, especially the HP PA ratings (see below). This erroneous inflation is the most likely cause of the speed advantage claimed for C by M.Fouts. Since 1984 I have upgraded measurements of the cpu-timer resolution and the test overhead time using fully convergent methods. Further, the entire Livermore Loops test is run seven times to measure the experimental errors for each of the 72 sample timings. All measured errors are reported in the output file to help users confirm the accuracy of the timings. The 1984 version in C is deficient and thus unreliable. The repitition loops around each kernel in the 1984 version were modified following some reports of erroneous code-hoisting by global optimizers. In 1990 these repitition loops were submerged into function TEST beyond the scope of optimizers so the code-samples are now bullet-proof. (Mr. Fouts rediscovered this problem but continues to use the 1984 test.) In 1986 Greg Astfalk (AT&T) reprogrammed subroutine KERNEL containing the 24 Fortran samples in C. This C module can be linked with the standard Fortran LFK test-driver program for testing under IDENTICAL benchmark conditions and accuracy as the Fortran samples benchmark. (The order of array sub-scripts in the C version was not reversed and hence the memory patterns and cache misses would differ from the Fortran version.) Our comparisons of the performance of our C module with the Fortran version show identical performance when the C and Fortran compilers share the same machine code generator - a necessary identity check. PROPOSAL Continued use of the unreliable 1984 version of the "Livermore Loops in C" with the deficiencies noted above would be indefensible and harm the reputation of our current, upgraded Livermore Loops Test. We seek a constructive resolution of this problem. We must have ONE, standard test program spec for all language implementations or test results will not be comparable with confidence and chaos will follow. We would propose transliteration of our current, 1991 Fortran version into C. Otherwise the two different versions must be distinguished by different names. We would welcome a collaborative effort to assure the equivalence of the Fortran and C versions, the accuracy of timing measurements, and the consistancy of reporting Mflop ratings. We solicit your opinions. LFK STANDARD MFLOP RATINGS INTERPRETATION The principal goal of the Livermore Loops Test is to measure and report a realistic performance range for diverse, cpu-bound computations and thus avoid reductionism: reducing the performance range to a single number. We have used hardware monitors to correlate all of the LFK test averages with the degree of tuning of real application programs: CORRELATION OF LFK TEST PERFORMANCE MEANS WITH LARGE WORKLOAD TUNING ------- -------- ---------- ----------------------- Type of CRAY-YMP1 Fraction Tuning of Workload Mean 72 Samples Flops in Correlated with (MFlops) Vector Ops LFK Mean Performance ------- -------- ---------- ----------------------- 2*AM 165.0 .97 Best applications AM 82.7 .89 Optimized applications GM 43.4 .74 Tuned workload HM 23.2 .45 Untuned workload HM(scalar) 12.4 .0 All-scalar applications ------- -------- ---------- ----------------------- (AM,GM,HM stand for Arithmetic, Geometric, Harmonic Mean Rates) The best central measure is the Geometric Mean(GM) of 72 rates because the GM is less biased by outliers than the Harmonic(HM) or Arithemetic(AM). CRAY hardware monitors have demonstrated that net Mflop rates for the LLNL and UCSD tuned workloads are closest to the 72 LFK test GM rate. LFK STANDARD NUMERICAL PERFORMANCE COMPARISIONS USING LFK TEST AVERAGES The range of speed-ups shown below as ratios of the performance statistics has a very small variance compared with the enormous performance ranges; the range of speed-ups are convergent estimates. TABLE OF SPEED-UP RATIOS OF LIVERMORE LOOPS MEAN RATES (72 Samples) (AM,GM,HM stand for Arithmetic, Geometric, Harmonic Mean Rates) -------- ---- ------ -------- -------- -------- -------- -------- -------- SYSTEM MEAN MFLOPS SX-3/14 YMP/1 9000/730 6000/540 SPARC 1+ i486/25 -------- ---- ------ -------- -------- -------- -------- -------- -------- NEC AM= 311.820 : 1.000 3.986 17.030 22.006 194.396 271.148 SX-3/14 GM= 95.590 : 1.000 2.610 6.081 8.909 66.767 91.038 F77v.012 HM= 38.730 : 1.000 2.193 2.916 5.199 30.488 42.098 SD= 499.780 CRAY AM= 78.230 : 0.251 1.000 4.273 5.521 48.770 68.026 YMP/1 GM= 36.630 : 0.383 1.000 2.330 3.414 25.585 34.886 CFT771.2 HM= 17.660 : 0.456 1.000 1.330 2.370 13.902 19.196 SD= 86.750 HP AM= 18.310 : 0.059 0.234 1.000 1.292 11.415 15.922 9000/730 GM= 15.720 : 0.164 0.429 1.000 1.465 10.980 14.971 f77 8.05 HM= 13.280 : 0.343 0.752 1.000 1.783 10.454 14.435 SD= 9.680 IBM AM= 14.170 : 0.045 0.181 0.774 1.000 8.834 12.322 6000/540 GM= 10.730 : 0.112 0.293 0.683 1.000 7.495 10.219 XL v0.90 HM= 7.450 : 0.192 0.422 0.561 1.000 5.865 8.098 SD= 9.590 SUN AM= 1.604 : 0.005 0.021 0.088 0.113 1.000 1.395 SPARC 1+ GM= 1.432 : 0.015 0.039 0.091 0.133 1.000 1.364 f77 v1.4 HM= 1.270 : 0.033 0.072 0.096 0.171 1.000 1.381 SD= 0.741 COMPAQ AM= 1.150 : 0.004 0.015 0.063 0.081 0.717 1.000 i486/25 GM= 1.050 : 0.011 0.029 0.067 0.098 0.733 1.000 HM= 0.920 : 0.024 0.052 0.069 0.123 0.724 1.000 SD= 0.480 Current Livermore Loops(aka LFK) source files are available from NISTLIB: 1. Create a file(named beta) containing one line of text: send mflops24 from llnl 2. E-mail file beta to: mail nistlib@cmr.ncsl.nist.gov < beta 3. NIST/libnet will return-mail the source-file: mflops24 REFERENCES F.H.McMahon, The Livermore Fortran Kernels: A Computer Test Of The Numerical Performance Range, Lawrence Livermore National Laboratory, Livermore, California, UCRL-53745, December 1986. from: National Technical Information Service U.S. Department of Commerce 5285 Port Royal Road Springfield, VA. 22161 J.T. Feo, An Analysis Of The Computational And Parallel Complexity Of The Livermore Loops, PARALLEL COMPUTING (North Holland), Vol 7(2), 163-185, (1988).