home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!convex!darwin.sura.net!sgiblab!swrinde!elroy.jpl.nasa.gov!ames!agate!dog.ee.lbl.gov!porpoise!marlin!aburto
- From: aburto@nosc.mil (Alfred A. Aburto)
- Newsgroups: comp.benchmarks
- Subject: Re: Geometric Mean or Median
- Message-ID: <1992Aug30.022440.1857@nosc.mil>
- Date: 30 Aug 92 02:24:40 GMT
- References: <1992Aug20.160352.13856@nas.nasa.gov> <1992Aug23.114309.3643@nosc.mil> <1992Aug26.160240.20114@murdoch.acc.Virginia.EDU>
- Distribution: comp.benchmarks
- Organization: Naval Ocean Systems Center, San Diego
- Lines: 84
-
- In article <1992Aug26.160240.20114@murdoch.acc.Virginia.EDU> clc5q@hemlock.cs.Virginia.EDU (Clark L. Coleman) writes:
- >In article <1992Aug23.114309.3643@nosc.mil> aburto@nosc.mil (Alfred A. Aburto) writes:
-
- In article <1992Aug23.114309.3643@nosc.mil>
- aburto@nosc.mil (Alfred A. Aburto) writes:
-
- >>However, even the SPEC results will vary, as will any measure of
- >>performance, when the underlying test parameters change. See SunWorld
- >>magazine, Mar 1992, pg 48, "SPEC: Odyssey of a benchmark", where
- >>geometric mean SPECmark ratings of 17.8, 20.0, 20.8, and 25.0 were
- >>measured on the same machine using different compilers.
-
- >This is given as an example of "noisiness" or "error proneness" in SPEC,
- >but SPEC was intended to measure the SYSTEM, including the compilers. If
- >one vendor has better compilers than another, it matters to the end users.
- >Conversely, trying to benchmark the raw hardware in some way that filters
- >out compiler differences would not be interesting to people who have to
- >purchase the systems and use the compilers, not just the hardware.
-
- It was given as an example of the need to show (indicate) the 'spread' in
- the results. There is not just ONE result, there are numerous results,
- depending upon many parameters that are difficult to control. The vendors
- system may get one result (25.0), but the users system may get an entirely
- different result. In the SunWorld article, even after alot of trouble,
- they were unable to duplicate the vendors result (25.0). They finally
- settled on 20.8 as the best they could do and left it at that. I'm saying
- that unnecessary troubles may have been avoided if the vendor had said
- instead something like: "this system has a rating of 21.0 +/- 4.0, and
- you'll achieve peak performance of approximately 25.0 by use of this
- compiler with these options, and 17.0 with this other compiler with
- these options." Or some simple statement such as that, using perhaps
- more appropriately the Maximum and Minimum results instead of the
- standard deviation or RMS error. It would have avoided unnecessary
- problems and been more informative overall. I don't want to hide any
- information at all. I'm trying to say that we need to bring out more
- information in the hope that it will avoid the type of problems
- discussed in the SunWorld article. I'm not claiming to know exactly how
- to rectify this situation. I'm just saying there appears to be a need to
- do it.
-
- Of course the 'spread' (or 'error' if you will) in performance due to
- different compilers and compiler options is only one aspect of the
- problem. Different types of programs produce different results and there
- is a spread in performance there too. Program size and memory usage, main
- memory speed, cache type, cache size, ..., etc. all produce a spread in
- performance. The overall spread is considerable, and this is why system
- testing is so difficult.
-
- With regards to 'filtering' I was thinking of the need to 'filter' the
- extreme data points. One learns about these extreme values by having
- other similar _program_ results for comparison. This 'filtering' aspect
- was not intended so much for a particular compiler result, but for a
- particular program result. If program 'A' produces an order of magnitude
- 'better' result than 9 other _similar_ programs for a particular system
- (compiler included) then I feel a need to do something about program 'A'.
- At least I'd become very interested in program 'A' and try to figure
- out why it produces results so different from the other programs.
- Filtering is an option when a few of the system results for program
- 'A' show extreme outliers compared to other program results on the
- same system (compiler included as part of the system). Its just an
- option as one might not want to throw all the program 'A' results out
- due to a few 'abnormal' results (due to extreme optimization for
- example on 1 out 'M' programs and on a few out 'N' systems). Its just
- an option and there are certainly many cases where one would not want
- to filter at all. It depends on the data.
-
- >The only real question in my mind about SPEC's approach is allowing
- >vendors to use different compiler switch settings for each individual
- >benchmark, if it produces better numbers. I don't think many users can
- >compile every program that many different times and run timing tests on
- >each one.
-
- This is a good point.
- On the other hand one cannot fault vendors in trying to achieve the
- optimum performance in each individual case, but unfortunately, as you
- say, this makes it tough on the users trying to figure out what are the
- best options to use for their own particular programs.
-
- [more to follow]
-
- Al Aburto
- aburto@marlin.nosc.mil
-
- -------
-