[Prev][Next][Index][Thread]

Re: Speedometer Results Overrated?



>>>>> "Mike" == ChessMan  <chessman@voicenet.com> writes:
In article <4e1aoj$ji8@news.voicenet.com> chessman@voicenet.com (ChessMan) writes:


    Mike> Although Speedometer makes it look like that Executor is
    Mike> running at say -- 25 mhz 68040 speed on a 90 mhz pentium -
    Mike> other CPU intensive programs may not fare as well.  I got
    Mike> MACCHESS 2.0 to run (quite well) on Executor and for the
    Mike> same settings and position, it took MACCHESS almost 3x as
    Mike> long on Executor to arrive at the same result - 124 seconds
    Mike> to 43 seconds for my Daystar accelerated 68040 40 mhz
    Mike> IIvx. Indicating a comparable speed of 14-15 mhz 68040.
    Mike> Still impressive for an emulation program, but not quite as
    Mike> impressive as some of the Speedometer results.

You're right in that the Speedometer results will not always scale,
although if you're seeing a 90 MHz Pentium performing at the speed of
a 14-15 MHz 68040, there *may* be more to it than straight CPU
emulation speed differences.

The way Executor works (more information provided is provided in
ftp://ftp.ardi.com/pub/SynPaper) is to translate commonly accessed
blocks of m68k instructions into x86 instructions and then execute the
x86 instructions.  This has two side-effects, either of which may be
skewing the results in this case.

Because the original m68k and the x86 instructions have to be kept
around, Executor uses more memory than a m68k based Mac uses, which
means sometimes it has to page portions of memory out to disk in order
to not run out of physical memory.  If Executor is doing *any* paging
while computing, it will slow things down dramatically.  *If* you're
running Executor without enough memory to do the computation without
paging, then by trying on another machine with more memory you might
get faster results.

In addition, Executor has to be aware of self-modifying code.  After
all, if it memorizes that a particular set of m68k instructions map
into a set of x86 instructions, if the original m68k instructions are
changed then Executor needs to throw away the mapped instructions and
do another recompile.  I don't know how MACCHESS 2.0 works, but if it
dynamically builds board evaluators and then jumps into them, then
Executor's performance will suffer for that.

Beyond those two limitations that are a direct result of the technique
that Executor uses, there is also a potential performance hit
associated with our particular implementation, in that in order to
make Executor's synthetic CPU smaller there are some instructions and
addressing modes that we never translate to native code, instead some
m68k code is translated to an intermediate interpretive form which is
what we use for m68k that isn't executed often enough to justify
recompilation into native code.

But in general we've found that the Speedometer numbers really are
good predictors of how well Executor will do on integer compute
intensive tests.  We've tried various photoshop plug-ins and also had
customers comment on the speed at which they've gotten various PCs to
run applications like NIH-Image and Stella.

Were Executor 2 not so far behind (due to general slippage, holiday
slippage, and MACWORLD Expo slippage), then it would be fun to find
out specifically what's going on with MACCHESS 2.0, but since Mat is
already 75% done on VCPU -- our next generation of synthetic CPU --
which is significantly faster yet, we'll probably wait to look at
MACCHESS 2.0's performance under VCPU, rather than Syn68k.

Thanks for the info.  Where are those 200 MHz P6s when you need them?

    Mike> Mike B.

--Cliff
ctm@ardi.com


Follow-Ups: References: