NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / benchmar / 1308 < prev next >

Wrap

Text File | 1992-08-14 | 5.0 KB | 107 lines

Newsgroups: comp.benchmarks Path: sparky!uunet!timbuk.cray.com!walter.cray.com!ferrari!cmg From: cmg@ferrari.cray.com (Charles Grassl) Subject: Re: Geometric Mean or Median Message-ID: <1992Aug14.151245.21649@walter.cray.com> Reply-To: cmg@magnet.cray.com Organization: Cray Research, inc. References: <PRENER.92Aug9220648@prener.watson.ibm.com> <1992Aug12.012620.3441@nosc.mil> <1992Aug12.172209.3108@nas.nasa.gov> <Aug14.142126.38458@yuma.ACNS.ColoState.EDU> <1992Aug14.155857.6561@riacs.edu> Distribution: comp.benchmarks Date: 14 Aug 92 15:12:45 CDT Lines: 94 In article <1992Aug14.155857.6561@riacs.edu>, lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes: >In article <Aug14.142126.38458@yuma.ACNS.ColoState.EDU>, shafer@CS.ColoState.EDU (spencer shafer) writes: >|> >|> A discussion of this, and an offered proof >|> of the geometric mean as preferred method is in the March 1986 issue of >|> Communications of the ACM, "How Not to Lie With Statistics: The Correct >|> Way to Summarize Benchmark Results," by Fleming and Wallace. > >Yes, and there was a rebuttal to this "proof" in CACM by, I believe, >J.E. Smith, in October of 1988. {If I have the reference correct,} >it is proved that the harmonic mean is the correct measure of rates, >if you want to examine a fixed workload and characterize the performance >on that workload. The references are below: [FL,WA] Fleming, P.J., Wallace, J.J, "How Not to Lie With Statistics: The Correct Way to Summarize Benchmark results", Communications of the ACM, P. 218-221, March, 1986, Volume 29, no. 3. [SM] Smith, J.E., "Characterizing Computer Performance With a Single Number", Communications of the ACM, P. 1202-1206, October, 1988, Volume 31, no. 10. [GU] Gustafson, J. et. al., "SLALOM", Supercomputing Review, P. 52-59, July, 1991. In [FL,WA], Fleming and Wallace advocate the use of a geometric mean for characterizing computer performance based on benchmarks. In [SM], Smith advocates the use of a harmonic mean, though he states that "the most obvious single number performance measure is the total time". The total (elapsed) time is not only accurate, but has considerable intuitive appeal. Neither of the articles, [FL, WAL] or [SM], offer "proofs" in the mathematical sense. (If Smith's "proof" is correct, then is Fleming's and Wallace's "proof" incorrect?) Why do two articles advocate different metrics? The answer lies in the underlying assumptions in each article. Fleming and Wallace stress the the geometric mean only applies to normalized performance results. The assumption that individual results are normalized leads to the use of the geometric mean. Smith, in his article, assumes that "work" is measured by floating point operations and that these operations are all equivalent pieces of the workload. This assumption leads to the use of a harmonic mean. The article have two distint and different assumptions: 1. Results normalized to a specific machine [FL,WA] 2. Work is measured by floating point operations [SM] Some benchmarks fit assumption (1) above. Some benchmarks fit assumption (2) above. Some benchmarks do not fit either assumption. Not all benchmark tests, especially those with a broad range of performance characteristics, have realistic machines to normalize against. For example, a VAX 11/780, which is used for normalization of the original SPEC benchmarks, is not appropriate for normalizing performance of large floating point simulations. We might ask, is the VAX 11/780 reasonable for calibrating RISC workstations? Not all computer "work" is measured by the number of floating point or integer operations. For example, the SLALOM benchmark [GU] does not have an accurate operation count. The authors of this benchmark do not count the number of floating point operations performed, rather, speed is measured by the number of "patches" covered in one minute of computation. Different algorithms have different numbers of operations, but as long as the same number of patches are computed in one minute, the speed is judged to be the same. It is the situation, or the constraints, of a particular benchmark with dictates the proper summarizing statistic. The table below lists the interpretation of various means. (Note that the usual referred to harmonic mean is often a -uniform- harmonic mean. Smith, in his article [SM], emphasizes the the use of weighted harmonic means.) Geometric mean: A measure of the distance in "performance space" from the reference machine to the tested machine. (Uniform) harmonic mean: The average performance if all benchmarks were adjusted so that each performed the same number of floating point operations. (Uniform) arithmetic mean: The average performance if all benchmarks were adjusted so that each ran for the same amount of time. Charles Grassl Cray Research, Inc. Eagan, Minnesota USA