1) I have reasonable confidence that the versions of the benchmarks being
used were reasonably consistent, by comparing the Sun4-110 & Sun4-200 numbers.
2) These benchmarks, of course, are fairly small in code size, i.e.,
like 500-1000 bytes of code. Hence the large (for the time) caches on the
SUns & MIPS machines turns out to be fairly irrelevant.
3) The relative performance numbers compare 16.7MHz R2000 system to 10MHz RTX.
4) Given the cache sizes used by these benchmarks, consider, for example,
a 16.7MHz IDT R3041, which has (I think) 2KB of on-chip I-cache + 512 bytes
of on-chip D-cache; being an R3000-derivative, it has cache refill characteristics more like the R3000 above than an R2000. I'd guess that
this chip would perform about the same as the R2000 above on these benchmarks;
a 20Mhz one would be about that much faster. Eyeballing this, it looks like
if the clock rates were EQUAL, the 3041 would have a 3X performance
advantage on these benchmarks. (Of course, clock speeds are not equal; hence
a 20Mhz 3041 would probably run 6X faster than a 10Mhz RTX....)
Despite being full 32-bit chips, here are the 3041 prices (in large quantities):
16.7Mhz: $15
20Mhz $19
And, if you really need bigger caches, MMUs, etc, you can work your way up thru
5-6 versions with different combinations.
Now, this data says nothing about other attributs of the chips, including
ability or not to run FORTH programs, context-switching, etc. On the other
hand, it pretty clearly says this particular stack architecture, for running
the small C codes, is significantly outperformed by a $15 chip that shares
the same instruction set used up though project supercomputers & mainframe-like
systems, and that must pay the price in die space and cost for being 32 bits,
rather than 16 bits. It does NOT say there is no role for 16-bit, stack-oriented micro-controllers: if they do the job, they OUGHT to be cheaper and smaller.
HOWEVER, at least some chip families (such as MIPS) already have something like 12-15 different implementations.
PLEASE STOP COMPARING a 16-bit microcontroller to the high-end implementations of such families, which have large caches, aggressive pipelines, MMUs, serious floating point, have 32-bit or even 64-bit architectures, and are required to run a wide variety of operating systems and languages well, and which are
required to run at speeds that make simple memory systems impossible. (As the
cited paper says, " The RTX does not use cache memories, but instead uses
system memory that is fast enough to guarantee single-cycle memory access."
That's *exactly* the right thing to do if you are speed-constrained anyway ...
and if we could design systems that had single-cycle access to main memory
at 150Mhz, we'd be *really* happy, but we can't...)
REASONABLE COMPARISONS compare against the $15 chips, not against the $500-1000 chips.... and I suspect the answers that come out are:
a) The embedded control world, there is room for a wide variety of
designs tuned for different things.
b) If code-space is *the* limiting problem, stack machines are good, or
maybe hybrid things like Hobbits.
c) If a (4,8, 16)-bit chip does the job, it ought to be cheaper
than using a 32-bit one; this is an orthogonal issue to stack versus
general register, of course.
d) FORTH chips will run FORTH effectively; the data above doesn't
suggest competitiveness on C code...
e) When RISC is played for low-cost, it can get pretty low cost
and still keep good performance. [It would be nice to have ARM
numbers, for example, but I don't have them handy].
-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: mash@sgi.com
DDD: 415-390-3090
USPS: Silicon Graphics 7U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311