NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / arch / 10474 < prev next >

Wrap

Text File | 1992-11-08 | 7.5 KB | 140 lines

Newsgroups: comp.arch Path: sparky!uunet!ukma!darwin.sura.net!sgiblab!sgigate!odin!mash.wpd.sgi.com!mash From: mash@mash.wpd.sgi.com (John R. Mashey) Subject: Re: What's wrong with stack machines? [$15 RISC versus RTX...] Message-ID: <1992Nov6.200453.26691@odin.corp.sgi.com> Sender: news@odin.corp.sgi.com (Net News) Nntp-Posting-Host: mash.wpd.sgi.com Organization: Silicon Graphics, Inc. References: <17189@mindlink.bc.ca> Date: Fri, 6 Nov 1992 20:04:53 GMT Lines: 127 In article <17189@mindlink.bc.ca>, Nick_Janow@mindlink.bc.ca (Nick Janow) writes: |> Stack machines tend to be smaller and simpler than the typical processors |> coming out these days (486, Alpha, SPARC, etc), without compromising speed. |> Looking at it the other way, for the same amount of silicon and engineering |> effort, the stack machine would probably be a lot faster. Maybe; but comparing an embedded-control implementation with these is apples-to-oranges... |> Code density does affect speed in many cases. If you only have 8Kbytes of |> high-speed RAM, wouldn't you like to be able to stick much of the program in |> that space? Stack machines are optimized for code re-use. Sure, but caches work, and in *fast* machines, I-cache bandwidth is much less of an issue than in dealing with D-cache misses... |> A processor with multiple stacks is probably easier to design for parallel |> operations than one with a large array of random access registers. The RTX, Why? |> for example, allows simultaneous access (one clock cycle) to both stacks, |> memory, and two I/O ports. Future processors might add a floating point |> stack, a string stack, a logic (IF, WHILE, etc) stack, a locals stack, or any |> other specialized stack that was appropriate. Memory access could be split |> into data and instruction. Parallel processing ports could be added. Note |> that present stack processors offer high performance without pipelining or |> memory caching. |> |> Processor design has mainly focused on optimizing for non-stack languages; |> there's a lot to explore in stack processor hardware and software design. SO, if this is all true, why haven't stack machines taken over everything? Let's look at some *data*: There are many sweeping assertions above; one must be very careful to avoid apples-oranges comparions that end up in sweeping claims. In the useful article "Performance of the Harris RTX 2000 Stack Architecture versus the Sun 4 SPARC and Sun 3 M68020 Architectures", by Keown, Koopman, Collins, in Computer Archiecture News, 20, 3 (June 1992), there is some good information, including the proper qualifier near the beginning: "The only stack architecture available to us for benchmarking is a 16-bit embedded controller which is not suitable for executing UNix or workstation applications. Therefore, we chose to use the Stanford Integer benchmarks instead of the more comprehensive, but unsuitable for RTX execution, SPEC benchmarks." The conclusions reached in the paper (RTX is OK in comparison with SUN-4-110 & 68020) are probably OK ... however, the 110 was not the most efficient implementation of SPARC, and some other RISCs were certainly more efficient than the early SPARC implementations. Hence: Included are actual times for various CPUs [RTX @ 10Mhz, SPARC @ 14.28Mhz, MC68020 @ 16.7Mhz, for the Stanford Integer benchmarks. To these I'll add corresponding times for 16.7Mhz Sun-4s, 16.7Mhz R2000s (MIPS RC2030), & 25Mhz R3000s (MIPS M/2000), from 1990's MIPS Performance Brief Issue 3.9: Time in seconds: System Bubble Intmm Perm Queen Quick Tower RTX, 10Mhz 0.76 0.47 0.44 0.34 0.41 0.52 SPARC, 14Mhz 0.22 0.42 0.15 0.12 0.19 0.25 Sun4-110 SPARC, 16.7Mhz 0.12 0.15 0.11 0.09 0.12 0.17 Sun4-200 R2000, 16.7Mhz 0.083 0.079 0.093 0.067 0.069 0.102 MIPS RC2030 R3000,25Mhz 0.054 0.052 0.065 0.043 0.046 0.069 MIPS M/2000 REL PERF 9.2X 5.9X 4.7X 5.1X 5.9X 5.1X R2000 > RTX Notes: 1) I have reasonable confidence that the versions of the benchmarks being used were reasonably consistent, by comparing the Sun4-110 & Sun4-200 numbers. 2) These benchmarks, of course, are fairly small in code size, i.e., like 500-1000 bytes of code. Hence the large (for the time) caches on the SUns & MIPS machines turns out to be fairly irrelevant. 3) The relative performance numbers compare 16.7MHz R2000 system to 10MHz RTX. 4) Given the cache sizes used by these benchmarks, consider, for example, a 16.7MHz IDT R3041, which has (I think) 2KB of on-chip I-cache + 512 bytes of on-chip D-cache; being an R3000-derivative, it has cache refill characteristics more like the R3000 above than an R2000. I'd guess that this chip would perform about the same as the R2000 above on these benchmarks; a 20Mhz one would be about that much faster. Eyeballing this, it looks like if the clock rates were EQUAL, the 3041 would have a 3X performance advantage on these benchmarks. (Of course, clock speeds are not equal; hence a 20Mhz 3041 would probably run 6X faster than a 10Mhz RTX....) Despite being full 32-bit chips, here are the 3041 prices (in large quantities): 16.7Mhz: $15 20Mhz $19 And, if you really need bigger caches, MMUs, etc, you can work your way up thru 5-6 versions with different combinations. Now, this data says nothing about other attributs of the chips, including ability or not to run FORTH programs, context-switching, etc. On the other hand, it pretty clearly says this particular stack architecture, for running the small C codes, is significantly outperformed by a $15 chip that shares the same instruction set used up though project supercomputers & mainframe-like systems, and that must pay the price in die space and cost for being 32 bits, rather than 16 bits. It does NOT say there is no role for 16-bit, stack-oriented micro-controllers: if they do the job, they OUGHT to be cheaper and smaller. HOWEVER, at least some chip families (such as MIPS) already have something like 12-15 different implementations. PLEASE STOP COMPARING a 16-bit microcontroller to the high-end implementations of such families, which have large caches, aggressive pipelines, MMUs, serious floating point, have 32-bit or even 64-bit architectures, and are required to run a wide variety of operating systems and languages well, and which are required to run at speeds that make simple memory systems impossible. (As the cited paper says, " The RTX does not use cache memories, but instead uses system memory that is fast enough to guarantee single-cycle memory access." That's *exactly* the right thing to do if you are speed-constrained anyway ... and if we could design systems that had single-cycle access to main memory at 150Mhz, we'd be *really* happy, but we can't...) REASONABLE COMPARISONS compare against the $15 chips, not against the $500-1000 chips.... and I suspect the answers that come out are: a) The embedded control world, there is room for a wide variety of designs tuned for different things. b) If code-space is *the* limiting problem, stack machines are good, or maybe hybrid things like Hobbits. c) If a (4,8, 16)-bit chip does the job, it ought to be cheaper than using a 32-bit one; this is an orthogonal issue to stack versus general register, of course. d) FORTH chips will run FORTH effectively; the data above doesn't suggest competitiveness on C code... e) When RISC is played for low-cost, it can get pretty low cost and still keep good performance. [It would be nice to have ARM numbers, for example, but I don't have them handy]. -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@sgi.com DDD: 415-390-3090 USPS: Silicon Graphics 7U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311