home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!sun-barr!ames!uakari.primate.wisc.edu!caen!hellgate.utah.edu!dog.ee.lbl.gov!porpoise!marlin!aburto
- From: aburto@nosc.mil (Alfred A. Aburto)
- Newsgroups: comp.benchmarks
- Subject: Re: Geometric Mean or Median
- Message-ID: <1992Aug14.020025.4555@nosc.mil>
- Date: 14 Aug 92 02:00:25 GMT
- References: <1992Aug9.201016.10008@nosc.mil> <12878@inews.intel.com>
- Distribution: comp.benchmarks
- Organization: Naval Ocean Systems Center, San Diego
- Lines: 97
-
- In article <12878@inews.intel.com> jwreilly@mipos2.UUCP (Jeffrey Reilly) writes:
-
- >Suppose the optimization to program 5 in case B was a legitimate
- >optimization that applied across a class of similar applications?
- >As a vendor, end-user, etc, I would think you would like to see this
- >reflected in whatever composite measure is used. I would argue that this
- >method appears too insensitive.
-
- This is a good question too. The thing is, we want the overall measure
- of performance to be more responsive or representative of the bulk of
- the data. We don't want the overall measure of performance to be biased
- by one or a few extreme outlying data points because they are not
- representative of the bulk of the data, they are low probability
- occuring events (correct or incorrect). You know all this, but that's
- the reason. I believe the Median does this, but it looks like I have
- some explaining to do about the definition of the Median.
-
- The next thing we want to do is define some parameters that describe
- (indicate, or give us an idea) what is happening at the tails of the
- distribution (cumulative distribution) because, as you indicate, this
- is where the really interesting things are happening. I would propose
- using in addition to the Median the Minimum and Maximum values, or
- the 10% and 90% points of the cumulative distribution for example.
- Three numbers is about 'right' I think. If we give alot of different
- measures of performance then people will pick out of that set one or
- a few perhaps that best fit their case for example and not everyone
- will pick the same measures of performance and thus things can get
- confused. We don't want to prevent people from learning more though,
- so in addition we provide (make available) all the raw data so people
- can just go crazy analyzing ever which way and this is 'good' as some
- interesting things are likely to be learned.
-
- To arrive at the above in a meaningful way we'll need 'good' validated
- data sets consisting of a sufficient number of samples (program results,
- routine results, kernel results, module results, ..., whatever
- constitutes a data sample). We want to pick enough samples so at least
- the cumulative distribution is relatively smooth (no big jumps in the
- distribution). I don't know how many samples is required, but I think
- that 10 isn't enough. Maybe 25 or 50 might be enough but we'd need to
- figure that out and it can be done. If the fluctuation is too big we
- might consider smoothing (filtering) the data. There are many
- possibilies to handle these problems (a simple binomial filter or
- Savitsky-Golay filter, or ...), but we need a sufficient number of
- validated samples first.
-
- >What definition of median are you using?
-
- Here is where we've had a problem (as it finally dawned on me). The
- median is defined as the 50% point of the cumulative distribution of
- samples. This is the definition I have been using, and it is the
- preferred way of deriving the Median from a set of samples. With
- small data samples it is subject to error as are all the other
- parameters. If the distribution of samples is not highly skewed you
- will find that the Median and Arithmetic Mean are very close or
- identical. See Papoulis for example: "Probability, Random Variables,
- and Stochastic Processes".
-
- >What do you mean by "reliable"?
-
- It is a measure of performance that is more representative of the
- main bulk of data and it is not easily biased by one or a few
- low probability of occurence events (data samples). The Mode might
- be even more useful and it can be inferred from the Arithmetic Mean
- and Median. Perhaps all these problems might vanish if the sample
- size were increased so that we'd get a better picture of the true
- distribution.
-
- >Regarding current standards, the SPEC metrics (SPECint92, SPECfp92, etc)
- >are DEFINED as being the geometric mean of the appropriate SPEC suites
- >SPECratios.
-
- Yes, and I think the Median might turn out to be more 'reliable', but
- also we need more data samples to better estimate the true sample
- distribution. Also we might want to consider the confidence of our
- estimates, particularly since the goal of deriving measures of
- performance is to compare two or more of them.
-
- >What is it they say, "with statistics anything can be proved"? :-)
-
- Yes, numbers can tell lies and we must be careful.
- Statistics is really Ok. The problems happen when we try to make claims
- with ill formed data sets --- with samples that do not adequately
- describe the true distribution it is certainly true that one can derive
- all sorts of wild and meaningless results (all lies).
-
- [Comments about references left out]
-
- >Jeff Reilly | "There is something fascinating about
- >Intel Corporation | science. One gets such wholesale returns
- >jwreilly@mipos2.intel.com | of conjecture out of such a trifling
- >(408) 765 - 5909 | investment of fact" - M. Twain
- >Disclaimer: All opinions are my own...
-
-
- Al Aburto
- aburto@marlin.nosc.mil
- -------
-