home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.benchmarks
- Path: sparky!uunet!timbuk.cray.com!walter.cray.com!ferrari!cmg
- From: cmg@ferrari.cray.com (Charles Grassl)
- Subject: Re: Geometric Mean or Median
- Message-ID: <1992Aug14.151245.21649@walter.cray.com>
- Reply-To: cmg@magnet.cray.com
- Organization: Cray Research, inc.
- References: <PRENER.92Aug9220648@prener.watson.ibm.com> <1992Aug12.012620.3441@nosc.mil> <1992Aug12.172209.3108@nas.nasa.gov> <Aug14.142126.38458@yuma.ACNS.ColoState.EDU> <1992Aug14.155857.6561@riacs.edu>
- Distribution: comp.benchmarks
- Date: 14 Aug 92 15:12:45 CDT
- Lines: 94
-
- In article <1992Aug14.155857.6561@riacs.edu>, lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes:
- >In article <Aug14.142126.38458@yuma.ACNS.ColoState.EDU>, shafer@CS.ColoState.EDU (spencer shafer) writes:
- >|>
- >|> A discussion of this, and an offered proof
- >|> of the geometric mean as preferred method is in the March 1986 issue of
- >|> Communications of the ACM, "How Not to Lie With Statistics: The Correct
- >|> Way to Summarize Benchmark Results," by Fleming and Wallace.
- >
- >Yes, and there was a rebuttal to this "proof" in CACM by, I believe,
- >J.E. Smith, in October of 1988. {If I have the reference correct,}
- >it is proved that the harmonic mean is the correct measure of rates,
- >if you want to examine a fixed workload and characterize the performance
- >on that workload.
-
- The references are below:
-
- [FL,WA] Fleming, P.J., Wallace, J.J, "How Not to Lie With Statistics:
- The Correct Way to Summarize Benchmark results",
- Communications of the ACM, P. 218-221, March, 1986, Volume 29,
- no. 3.
-
- [SM] Smith, J.E., "Characterizing Computer Performance With a Single
- Number", Communications of the ACM, P. 1202-1206, October,
- 1988, Volume 31, no. 10.
-
- [GU] Gustafson, J. et. al., "SLALOM", Supercomputing Review, P.
- 52-59, July, 1991.
-
- In [FL,WA], Fleming and Wallace advocate the use of a geometric mean
- for characterizing computer performance based on benchmarks. In [SM],
- Smith advocates the use of a harmonic mean, though he states that "the
- most obvious single number performance measure is the total time". The
- total (elapsed) time is not only accurate, but has considerable
- intuitive appeal.
-
- Neither of the articles, [FL, WAL] or [SM], offer "proofs" in the
- mathematical sense. (If Smith's "proof" is correct, then is Fleming's
- and Wallace's "proof" incorrect?) Why do two articles advocate
- different metrics? The answer lies in the underlying assumptions in
- each article.
-
- Fleming and Wallace stress the the geometric mean only applies to
- normalized performance results. The assumption that individual results
- are normalized leads to the use of the geometric mean. Smith, in his
- article, assumes that "work" is measured by floating point operations
- and that these operations are all equivalent pieces of the workload.
- This assumption leads to the use of a harmonic mean.
-
- The article have two distint and different assumptions:
- 1. Results normalized to a specific machine [FL,WA]
- 2. Work is measured by floating point operations [SM]
-
- Some benchmarks fit assumption (1) above. Some benchmarks fit
- assumption (2) above. Some benchmarks do not fit either assumption.
-
- Not all benchmark tests, especially those with a broad range of
- performance characteristics, have realistic machines to normalize
- against. For example, a VAX 11/780, which is used for normalization of
- the original SPEC benchmarks, is not appropriate for normalizing
- performance of large floating point simulations. We might ask, is the
- VAX 11/780 reasonable for calibrating RISC workstations?
-
- Not all computer "work" is measured by the number of floating point or
- integer operations. For example, the SLALOM benchmark [GU] does not
- have an accurate operation count. The authors of this benchmark do not
- count the number of floating point operations performed, rather, speed
- is measured by the number of "patches" covered in one minute of
- computation. Different algorithms have different numbers of
- operations, but as long as the same number of patches are computed in
- one minute, the speed is judged to be the same.
-
- It is the situation, or the constraints, of a particular benchmark with
- dictates the proper summarizing statistic. The table below lists the
- interpretation of various means. (Note that the usual referred to
- harmonic mean is often a -uniform- harmonic mean. Smith, in his
- article [SM], emphasizes the the use of weighted harmonic means.)
-
- Geometric mean: A measure of the distance in "performance
- space" from the reference machine to the
- tested machine.
-
- (Uniform) harmonic mean: The average performance if all benchmarks
- were adjusted so that each performed the
- same number of floating point operations.
-
- (Uniform) arithmetic mean: The average performance if all benchmarks
- were adjusted so that each ran for the
- same amount of time.
-
-
-
- Charles Grassl
- Cray Research, Inc.
- Eagan, Minnesota USA
-