home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!darwin.sura.net!mips!pacbell.com!lll-winken!lll-crg.llnl.gov!mcmahon
- From: mcmahon@lll-crg.llnl.gov (Frank McMahon)
- Newsgroups: comp.benchmarks
- Subject: Proposal To Correct Deficiencies In "Livermore Loops in C"
- Message-ID: <131316@lll-winken.LLNL.GOV>
- Date: 24 Jul 92 00:08:34 GMT
- Sender: usenet@lll-winken.LLNL.GOV
- Organization: Lawrence Livermore National Laboratory
- Lines: 211
- Nntp-Posting-Host: lll-crg.llnl.gov
-
-
-
-
-
- TO: Computational Benchmark Colleagues
- FROM: Frank McMahon ( author of Livermore Loops, aka LFK Test)
- Lawrence Livermore National Laboratory
- P.O. Box 808, L-35
- Livermore, CA. 94550
- USA
-
- MCMAHON3@LLNL.GOV
- mcmahon@ocfmail.ocf.llnl.gov
- SUBJECT: Proposal To Correct Deficiencies in "Livermore Loops in C"
- DATE: 92.07.17
-
-
- In 1991 the "Livermore Loops in C" were transliterated from our
- Fortran version (Copyright 1983 The Regents of the University
- of California) by M.Fouts. Unfortunately Mr. Fouts transliterated
- an archaic 1984 version of my test-driver program. In the past eight
- years I upgraded my LFK test several times to overcome timing errors
- caused by low resolution (.01 sec.) cpu-timers (e.g. UNIX/ETIME).
- The obsolete 1984 LFK test program could cause significant
- timing errors and inflated Mflop ratings on todays fast UNIX workstations
- which, if not too large, might not arouse suspicion. I have noticed
- inflated Mflop rates in several of the published C ratings compared to
- our upgraded Fortran version, especially the HP PA ratings (see below).
- This erroneous inflation is the most likely cause of the speed advantage
- claimed for C by M.Fouts.
-
- Since 1984 I have upgraded measurements of the cpu-timer resolution and
- the test overhead time using fully convergent methods. Further, the entire
- Livermore Loops test is run seven times to measure the experimental errors
- for each of the 72 sample timings. All measured errors are reported in
- the output file to help users confirm the accuracy of the timings.
- The 1984 version in C is deficient and thus unreliable.
-
- The repitition loops around each kernel in the 1984 version were modified
- following some reports of erroneous code-hoisting by global optimizers.
- In 1990 these repitition loops were submerged into function TEST beyond
- the scope of optimizers so the code-samples are now bullet-proof.
- (Mr. Fouts rediscovered this problem but continues to use the 1984 test.)
-
- In 1986 Greg Astfalk (AT&T) reprogrammed subroutine KERNEL containing
- the 24 Fortran samples in C. This C module can be linked with
- the standard Fortran LFK test-driver program for testing under
- IDENTICAL benchmark conditions and accuracy as the Fortran samples benchmark.
- (The order of array sub-scripts in the C version was not reversed and hence
- the memory patterns and cache misses would differ from the Fortran version.)
- Our comparisons of the performance of our C module with the Fortran version
- show identical performance when the C and Fortran compilers share the same
- machine code generator - a necessary identity check.
-
-
-
-
-
-
-
-
- PROPOSAL
-
- Continued use of the unreliable 1984 version of the "Livermore Loops in C"
- with the deficiencies noted above would be indefensible and harm the
- reputation of our current, upgraded Livermore Loops Test.
- We seek a constructive resolution of this problem.
- We must have ONE, standard test program spec for all language implementations
- or test results will not be comparable with confidence and chaos will follow.
- We would propose transliteration of our current, 1991 Fortran version into C.
- Otherwise the two different versions must be distinguished by different names.
- We would welcome a collaborative effort to assure the equivalence of
- the Fortran and C versions, the accuracy of timing measurements, and
- the consistancy of reporting Mflop ratings. We solicit your opinions.
-
-
-
-
- LFK STANDARD MFLOP RATINGS INTERPRETATION
-
- The principal goal of the Livermore Loops Test is to measure and report
- a realistic performance range for diverse, cpu-bound computations and thus
- avoid reductionism: reducing the performance range to a single number.
- We have used hardware monitors to correlate all of the LFK test averages
- with the degree of tuning of real application programs:
-
-
- CORRELATION OF LFK TEST PERFORMANCE MEANS WITH LARGE WORKLOAD TUNING
-
- ------- -------- ---------- -----------------------
- Type of CRAY-YMP1 Fraction Tuning of Workload
- Mean 72 Samples Flops in Correlated with
- (MFlops) Vector Ops LFK Mean Performance
- ------- -------- ---------- -----------------------
-
- 2*AM 165.0 .97 Best applications
-
- AM 82.7 .89 Optimized applications
-
- GM 43.4 .74 Tuned workload
-
- HM 23.2 .45 Untuned workload
-
- HM(scalar) 12.4 .0 All-scalar applications
- ------- -------- ---------- -----------------------
- (AM,GM,HM stand for Arithmetic, Geometric, Harmonic Mean Rates)
-
- The best central measure is the Geometric Mean(GM) of 72 rates because the
- GM is less biased by outliers than the Harmonic(HM) or Arithemetic(AM).
- CRAY hardware monitors have demonstrated that net Mflop rates for the
- LLNL and UCSD tuned workloads are closest to the 72 LFK test GM rate.
-
-
-
-
-
-
-
-
-
- LFK STANDARD NUMERICAL PERFORMANCE COMPARISIONS USING LFK TEST AVERAGES
-
- The range of speed-ups shown below as ratios of the performance
- statistics has a very small variance compared with the enormous
- performance ranges; the range of speed-ups are convergent estimates.
-
-
-
- TABLE OF SPEED-UP RATIOS OF LIVERMORE LOOPS MEAN RATES (72 Samples)
-
- (AM,GM,HM stand for Arithmetic, Geometric, Harmonic Mean Rates)
-
- -------- ---- ------ -------- -------- -------- -------- -------- --------
- SYSTEM MEAN MFLOPS SX-3/14 YMP/1 9000/730 6000/540 SPARC 1+ i486/25
- -------- ---- ------ -------- -------- -------- -------- -------- --------
-
-
- NEC AM= 311.820 : 1.000 3.986 17.030 22.006 194.396 271.148
- SX-3/14 GM= 95.590 : 1.000 2.610 6.081 8.909 66.767 91.038
- F77v.012 HM= 38.730 : 1.000 2.193 2.916 5.199 30.488 42.098
- SD= 499.780
-
-
- CRAY AM= 78.230 : 0.251 1.000 4.273 5.521 48.770 68.026
- YMP/1 GM= 36.630 : 0.383 1.000 2.330 3.414 25.585 34.886
- CFT771.2 HM= 17.660 : 0.456 1.000 1.330 2.370 13.902 19.196
- SD= 86.750
-
-
- HP AM= 18.310 : 0.059 0.234 1.000 1.292 11.415 15.922
- 9000/730 GM= 15.720 : 0.164 0.429 1.000 1.465 10.980 14.971
- f77 8.05 HM= 13.280 : 0.343 0.752 1.000 1.783 10.454 14.435
- SD= 9.680
-
-
- IBM AM= 14.170 : 0.045 0.181 0.774 1.000 8.834 12.322
- 6000/540 GM= 10.730 : 0.112 0.293 0.683 1.000 7.495 10.219
- XL v0.90 HM= 7.450 : 0.192 0.422 0.561 1.000 5.865 8.098
- SD= 9.590
-
-
- SUN AM= 1.604 : 0.005 0.021 0.088 0.113 1.000 1.395
- SPARC 1+ GM= 1.432 : 0.015 0.039 0.091 0.133 1.000 1.364
- f77 v1.4 HM= 1.270 : 0.033 0.072 0.096 0.171 1.000 1.381
- SD= 0.741
-
-
- COMPAQ AM= 1.150 : 0.004 0.015 0.063 0.081 0.717 1.000
- i486/25 GM= 1.050 : 0.011 0.029 0.067 0.098 0.733 1.000
- HM= 0.920 : 0.024 0.052 0.069 0.123 0.724 1.000
- SD= 0.480
-
-
-
-
-
-
-
-
- Current Livermore Loops(aka LFK) source files are available from NISTLIB:
-
- 1. Create a file(named beta) containing one line of text:
-
- send mflops24 from llnl
-
- 2. E-mail file beta to:
-
- mail nistlib@cmr.ncsl.nist.gov < beta
-
- 3. NIST/libnet will return-mail the source-file: mflops24
-
-
-
-
- REFERENCES
-
- F.H.McMahon, The Livermore Fortran Kernels:
- A Computer Test Of The Numerical Performance Range,
- Lawrence Livermore National Laboratory,
- Livermore, California, UCRL-53745, December 1986.
-
- from: National Technical Information Service
- U.S. Department of Commerce
- 5285 Port Royal Road
- Springfield, VA. 22161
-
- J.T. Feo, An Analysis Of The Computational And Parallel
- Complexity Of The Livermore Loops, PARALLEL COMPUTING
- (North Holland), Vol 7(2), 163-185, (1988).
-
-
-