home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.benchmarks
- Path: sparky!uunet!convex!darwin.sura.net!aplcen.apl.jhu.edu!uakari.primate.wisc.edu!usenet.coe.montana.edu!decwrl!concert!uvaarpa!murdoch!hemlock.cs.Virginia.EDU!clc5q
- From: clc5q@hemlock.cs.Virginia.EDU (Clark L. Coleman)
- Subject: Dhrystone and SPECint Correlation
- Message-ID: <1992Sep4.210245.19914@murdoch.acc.Virginia.EDU>
- Sender: usenet@murdoch.acc.Virginia.EDU
- Organization: University of Virginia Computer Science Department
- References: <1992Aug23.114309.3643@nosc.mil> <1992Aug26.160240.20114@murdoch.acc.Virginia.EDU> <1992Aug31.002356.24988@nosc.mil>
- Distribution: comp.benchmarks
- Date: Fri, 4 Sep 1992 21:02:45 GMT
- Lines: 169
-
- In article <1992Aug31.002356.24988@nosc.mil> aburto@nosc.mil (Alfred A. Aburto) writes:
- >
- >In Article <1992Aug26.160240.20114@murdoch.acc.Virginia.EDU>
- >clc5q@hemlock.cs.Virginia.EDU (Clark L. Coleman) writes:
- >>Similar poor correlations will be obtained for two different systems
- >>with very different cache sizes. Compare the HP9000/720 to a smaller
- >>cache machine like an IBM RS/6000 or Sun SS2. For example, here are some
- >>Spring, 1991, numbers:
- >>
- >> SPECint89 Dhrystone 1.1 MIPS MIPS/SPECint89
- >> --------- ------------------ --------------
- >>HP 9000/720 39.0 57 1.46
- >>DEC 5000/200 19.0 24.2 1.27
- >>IBM RS6000/550 34.5 56 1.62
- >>
- >>If I didn't have SPECint89 numbers, but wanted to derive them from
- >>available Dhrystone MIPS numbers, the third column above would indicate
- >>that I have a tough job ahead of me.
- >
- >
- >But they ARE correlated! You can see it just by looking at the
- >SPECint89 and Dhrystone1.1 numbers. It is incorrect to use the third
- >column (above) to make any predictions or draw conclusions as it
- >consists of ratio's of the raw data (program, 'benchmark', results).
- >I'll explain below.
-
- I'll take the liberty of not including the text of your explanation, although
- it was a good one, because I think we just aren't communicating here.
-
- Here is my perspective: I am trying to determine how fast various machines
- are. We are buying some workstations soon at my company, Acme Tool and Die.
- My boss doesn't see what the big deal is about all this benchmarking stuff,
- and doesn't want to get loaner machines from multiple vendors, port our
- code to each, time the results, etc. He says it would take too much time,
- as the porting of our code turns out to be nontrivial. So we are going to
- stick to standard benchmarks. Unfortunately, he didn't buy my arguments
- against trying to use a single benchmark number; he refuses to chart out
- every SPECint result for his boss when he makes the final proposal for
- what workstations to buy.
-
- Now, I have Dhrystone 1.1 MIPS numbers available for various machines. I
- have read the marketing literature, and they all assure me that only those
- compiler optimizations that were specified by Reinhold Weicker as being
- allowable for Dhrystone were used (no inlining, for example.) So I feel
- pretty good about these numbers, as Dhrystone 1.1 numbers go.
-
- I also have some SPECint92 numbers, and some SPECint89 numbers, but not
- complete lists of both for all interesting machines, and neither one of
- them for some machines.
-
- Our applications rarely use floating point data, and are not heavy on
- graphics or I/O, either. So, my boss tells me to just rank the machines
- by their Dhrystone 1.1 MIPS numbers, and he will look over the results.
- He is smart enough not to make a big deal out of one machine having 51
- MIPS while another has 49 MIPS, but he wants this MIPS list as a rough
- guide to integer performance.
-
- The $64,000 question is: Are we on reasonably safe ground to use Dhrystone 1.1
- MIPS in lieu of the SPECint92 numbers we wish we had?
-
- You have made the statement that "There is a very high correlation between
- SPECint and Dhrystone 1.1 MIPS", or something similar, several times. I see
- two possibilities here:
-
- 1) The fact that the two numbers correlate highly does not necessarily imply
- that one is a good substitute for the other if we are trying to get a
- reasonably accurate ranking of the various candidate machines.
-
- 2) The correlation DOES indicate that Dhrystone 1.1 MIPS is pretty much as
- good as SPECint92, if all you want is a single number for integer CPU
- speed (not I/O or cache constrained performance.)
-
- If you tell me #1 is the case, then your regression and correlation are of
- pedantic interest only, and I see no point in continuing to debate this
- matter any further.
-
- If you say that #2 is the case, I have a very simple disproof.
-
- Let's say that my list of machines includes the new, souped up version of
- the DECstation 5000/200, with the clock sped up by a factor of 2.35, and
- the memory and cache proportionately faster to keep up with it. I will
- assume that SPECint92 tracks SPECint89 here, because I only have SPECint92
- number for the 36 MHz SPARCstation 10 that I am about to use. The new DEC
- machine has 44.8 SPECint92, and 57.0 Dhrystone 1.1 MIPS. These are in
- direct 2.35 to 1 ratios to the DEC 5000/200 numbers above, so the new machine
- will not disturb your old regression and correlation at all.
-
- Now, on my list, I have shown my boss his choices, and one of them is the
- 36 MHz SPARCstation 10, which shows up with 86 Dhrystone 1.1 MIPS. I don't
- have the SPECint92 yet --- their marketing department is working on it.
- My boss decides that there might be some error in the MIPS values ("spread"
- as you put it), but as there is a high correlation between the SPECint92
- and the MIPS (he read this on the Internet somewhere :-) ), he isn't too
- worried that the SPECint92 numbers will be very different when they come
- out. So, he sees a 33% MIPS increase in the SPARCstation 10 over the
- new DEC machine, for only 10% more cost, and figures that SPECint92 will
- probably show the same 33% increase, or close to it. After all, this
- highly touted statistical correlation must have some real world value,
- right?
-
- We buy the SPARCstation 10 machines. A month later, Sun releases their
- SPECint92 numbers : 44.8, the exact same as the new DEC machine. So, we
- have:
-
- Machine: DEC5000/200super Sun SS-10 (36 MHz)
- -------- ---------------- ------------------
- MIPS 57 86
- SPECint92 44.8 44.8
-
- NOTE: The above numbers are 2.35 to 1 ratios for the DEC5000/200, and so could
- reflect a hypothetical but reasonable speed up of that architecture. The Sun
- numbers are actual numbers from Sun.
-
- In examining the Sun machines, we find that there was almost exactly a doubling
- of SPECint92 performance from the SS-2 to the SS-10 at 36 MHz, but there was
- a tripling of the Dhrystone 1.1 MIPS. Which is the better indicator of integer
- CPU speed? I contend that Dhrystone (any version) is rapidly being obsoleted
- EVEN IN THE SINGLE NUMBER BENCHMARKING world. I gave detailed reasons in a
- previous posting that relate to superscalar instruction scheduling.
-
- The reason that MIPS/SPECint92 ratios matter, despite your previous objections,
- is that widely different ratios will create a large spread between the
- realistic integer CPU performance expectations for a machine and the Dhrystone
- 1.1 MIPS estimate of its integer CPU performance. Based on the ratios that are
- found between the SS-10 and the SS-2 on better benchmarks than Dhrystone (such
- as SPECint92), the SS-10 should have about 57 Dhrystone MIPS. That it has 86
- MIPS instead gives us a large range (from 57 to 86) into which we can expect
- to find a competing machine someday (if not already.) If that competing
- machine is not superscalar in its integer functional units, it will be likely
- that its 70 or so Dhrystone 1.1 MIPS are a better indicator of its integer
- performance than the 86 MIPS are for the SS-10; and it will be likely that
- its SPECint92 will be significantly higher than the 44.8 of the SS-10. We
- will then have a pair of machines where one has an 86 to 70 edge in MIPS,
- and the other has a 58 to 45 edge in SPECint92. (I would not be surprised
- to find that this relationship already exists between the SGI Crimson and
- the SS-10 today.) In which case, I should forget Dhrystone MIPS and stick
- to SPECint92, which was my whole point in the first place. Q.E.D.
-
- P.S. The correct possibility between the two given above is #1. The
- correlation indicates that at the time you measured and did your regression,
- CPU architectures were reasonably similar to each other in their behavior
- on SPECint89 and Dhrystone 1.1 code. That the SS-10 demonstrates that
- this is no longer true is reason enough to abandon Dhrystone timings for
- the future. The other statistical point is that we benchmark in order
- to compare machines (two competing new ones, or an upgraded machine and
- our old machine) in order to provide objective input to purchasing decisions
- and architecture evaluations. If you get 100 machines built with about the
- same RISC philosophy, and 1 more machine that is the only superscalar machine
- in the group, that one outlying point will not disturb your correlation
- appreciably. But, if my purchasing decision comes down to one of the
- conventional machines versus that one outlying point, we don't have the
- other 99 data points to average in and make the fitted curve almost ignore
- the outlier. What we have is a head to head comparison, and the question,
- "Should I pay attention to the Dhrystone numbers when comparing these two
- machines?" The answer is a resounding "No." SPECint92 will be a better
- point of comparison because it is composed of real codes, and should have
- more realistic characteristics with respect to superscalar scheduling than
- the Dhrystone 1.1 code. You will notice also that the little table of 3
- machines in my previous posting, included above, shows its highest ratio
- of MIPS to SPECint89 by far on a superscalar IBM RS6000. When you come
- down to an evaluation of 2 or 3 machines, the fact that one of them was
- an outlier that did not disturb the regression you mention is not very
- comforting to know. The correlation is, at that point, irrelevant.
-
-
- --
- --------------------------------------------------------------------------
- "It is seldom that any liberty is lost all at once." David Hume
- ||| clc5q@virginia.edu (Clark L. Coleman)
-