NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / arch / 10527 < prev next >

Wrap

Internet Message Format | 1992-11-08 | 5.8 KB

Path: sparky!uunet!caen!destroyer!cs.ubc.ca!jonathan From: jonathan@geop.ubc.ca (Jonathan Thornburg) Newsgroups: comp.arch Subject: cpu vs memory speed vs need for cache memory (was: Re: RTX and SC32) Date: 8 Nov 1992 17:33:36 GMT Organization: U of BC Geophysics & Astronomy + U of Texas Physics/Relativity Lines: 94 Sender: Jonathan Thornburg <jonathan@geop.ubc.ca> Message-ID: <1djj1gINNkt1@cs.ubc.ca> References: <17258@mindlink.bc.ca> NNTP-Posting-Host: rubis.astro.ubc.ca Summary: fast cpu ==> need fast memory, but *cheap* fast cpu ==> need cache Keywords: cpu memory speed cache In an article which I have lost the reference to, mo@bellcore.com (I believe that's Mike O'Dell; hi, thanks for that great Usenix paper on all the trouble Prisma had with the speed of light :-) ) writes [wrote]: | What I have not seen in the literature (which doesn't mean it doesn't | exist, just that i haven't seen it) is a comparison of a stack machine | design with the equivalent complexity budget of modern RISC machines to | such modern processors. By complexity budget I mean one that spends as much | logic on cache and memory system complexity as the fast RISC machines do. | (including the related trickle-down complexity like dealing with | out-of-order completion in the pipe, if required) In article <17258@mindlink.bc.ca> Nick_Janow@mindlink.bc.ca (Nick Janow) writes [replies]: | Why would anyone want to bog down a stack machine with all the baggage needed | for a non-stack machine? From what I've heard, the stack processors such as | the RTX and SC32 can compete with RISC chips of the same technology level | (not comparing 1.2 um against .5 um) without having to use caches or other | complex memory systems. To put it simply, any chip can use an non-cached all-fast-SRAM main memory. But if that memory is very big (and modern computing *demands* big main memories), it's going to be *very* expensive. The markets the RTX and SC32 are aimed at (high-end embedded controller) need the bounded worst-case performance of a no-cache memory system, and can tolerate the high cost. But general-purpose computing wants average-case performance, and the lowest possible cost. Hence cache memories are ubiquitous (sp) on modern general-purpose computers. In particular, any modern stack-machine-based system aimed at a general-purpose market is going to need a fancy cache and memory system, just like a modern RISC. (The remainder of this article expands on this argument in more detail. Press 'n' now if you have seen enough.) In detail: ========== Ignoring the O(1) factors in code size between different architectures, anyone who uses modern software and/or does a lot of heavy-duty computing these days wants a *lot* of memory. With the possible exception of workstations running X :-), most memory these days goes for *data*, not code, so instruction sets are only a minor perturbation. (Consider, for example, solving 3-dimensional partial differential equations to simulate oil resivoir depletion, automobile design, hydrogen bomb explosions, etc. Or crunching stochastic models of the world's stock markets to make your second $100M. Or rendering 16386*16384*24bit bitmaps for the newest Holywood flick. Or doing a design verification and layout rule check of your new 5M-transistor CPU chip. Etc, etc, ...) Indeed, most "ordinary" workstations these days come with 16M from the factory, and people doing serious crunching frequently upgrade to 32M, 64M, or 128M. Moreover, again ignoring the issue of just what type of instruction set the CPU uses, I think we all agree that fast CPUs need fast memory. More precisely, just about any modern processor wants an effective "1st level off-chip" memory access of time at most "a few" cycles, and with clock rates pushing 100 MHz these days for performance, that leaves O(10ns) or less for the off-chip memory access. Now consider building such a (large, fast) memory system, whether for a stack machine or for a modern RISC (or for a "classic CISC" like a VAX, for that matter). To oversimplify greatly, there are two different ways to build a large fast memory system: - SRAM-based: buy a lot of fast SRAM and build main memory out of it. No need to worry about caches, gives predictable access times so you can bound worst-case performance, and relatively simple to design. The problem with this type of memory is that fast SRAM costs a *lot* of money, dissipates a *lot* of power, and tends to have a rather small capacity per chip. (For example, what's the biggest 12ns SRAM you can buy 100,000 of, for delivery in 6 weeks? How much does it cost? And can you buy a heat sink and cooling fan big enough to prevent a meltdown when you power up your blindingly fast memory board populated with 64MB of these chips?) - DRAM+cache-based: buy a lot of commodity DRAM and build main memory out of it, *and* buy a modicum of fast SRAM and build a cache out of it. If you get the cache design right, you can still get most of the average-case performance of an SRAM-based memory, yet keep the the relatively low cost and power dissipation of the DRAMs. (To continue the same example, what's the biggest 100ns DRAM you can buy 100,000 of, for delivery in 6 weeks? How much does it cost? The answers are "probably a lot bigger than 12ns SRAM" and "certainly a lot less".) The cost difference between these two approaches is why all modern general-purpose computer systems that worry about performance, use cache memories. And that would include stack machines if they were to compete in the general-purpose computing marketplace. - Jonathan Thornburg <jonathan@geop.ubc.ca> through mid-November 92, then <jonathan@einstein.ph.utexas.edu> or <jonathan@hermes.chpc.utexas.edu> [for a few more months] UBC / {Astronomy,Physics} [then through Aug/92] U of Texas at Austin / Physics Dept / Center for Relativity