home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!caen!destroyer!cs.ubc.ca!jonathan
- From: jonathan@geop.ubc.ca (Jonathan Thornburg)
- Newsgroups: comp.arch
- Subject: cpu vs memory speed vs need for cache memory (was: Re: RTX and SC32)
- Date: 8 Nov 1992 17:33:36 GMT
- Organization: U of BC Geophysics & Astronomy + U of Texas Physics/Relativity
- Lines: 94
- Sender: Jonathan Thornburg <jonathan@geop.ubc.ca>
- Message-ID: <1djj1gINNkt1@cs.ubc.ca>
- References: <17258@mindlink.bc.ca>
- NNTP-Posting-Host: rubis.astro.ubc.ca
- Summary: fast cpu ==> need fast memory, but *cheap* fast cpu ==> need cache
- Keywords: cpu memory speed cache
-
- In an article which I have lost the reference to, mo@bellcore.com
- (I believe that's Mike O'Dell; hi, thanks for that great Usenix paper
- on all the trouble Prisma had with the speed of light :-) ) writes [wrote]:
- | What I have not seen in the literature (which doesn't mean it doesn't
- | exist, just that i haven't seen it) is a comparison of a stack machine
- | design with the equivalent complexity budget of modern RISC machines to
- | such modern processors. By complexity budget I mean one that spends as much
- | logic on cache and memory system complexity as the fast RISC machines do.
- | (including the related trickle-down complexity like dealing with
- | out-of-order completion in the pipe, if required)
-
- In article <17258@mindlink.bc.ca> Nick_Janow@mindlink.bc.ca (Nick Janow)
- writes [replies]:
- | Why would anyone want to bog down a stack machine with all the baggage needed
- | for a non-stack machine? From what I've heard, the stack processors such as
- | the RTX and SC32 can compete with RISC chips of the same technology level
- | (not comparing 1.2 um against .5 um) without having to use caches or other
- | complex memory systems.
-
- To put it simply, any chip can use an non-cached all-fast-SRAM main
- memory. But if that memory is very big (and modern computing *demands*
- big main memories), it's going to be *very* expensive. The markets
- the RTX and SC32 are aimed at (high-end embedded controller) need the
- bounded worst-case performance of a no-cache memory system, and can
- tolerate the high cost. But general-purpose computing wants average-case
- performance, and the lowest possible cost. Hence cache memories are
- ubiquitous (sp) on modern general-purpose computers. In particular,
- any modern stack-machine-based system aimed at a general-purpose market
- is going to need a fancy cache and memory system, just like a modern
- RISC.
-
- (The remainder of this article expands on this argument in more detail.
- Press 'n' now if you have seen enough.)
-
- In detail:
- ==========
-
- Ignoring the O(1) factors in code size between different
- architectures, anyone who uses modern software and/or does a lot of
- heavy-duty computing these days wants a *lot* of memory. With the
- possible exception of workstations running X :-), most memory these
- days goes for *data*, not code, so instruction sets are only a minor
- perturbation. (Consider, for example, solving 3-dimensional partial
- differential equations to simulate oil resivoir depletion, automobile
- design, hydrogen bomb explosions, etc. Or crunching stochastic models
- of the world's stock markets to make your second $100M. Or rendering
- 16386*16384*24bit bitmaps for the newest Holywood flick. Or doing a
- design verification and layout rule check of your new 5M-transistor
- CPU chip. Etc, etc, ...) Indeed, most "ordinary" workstations
- these days come with 16M from the factory, and people doing serious
- crunching frequently upgrade to 32M, 64M, or 128M.
-
- Moreover, again ignoring the issue of just what type of instruction
- set the CPU uses, I think we all agree that fast CPUs need fast memory.
- More precisely, just about any modern processor wants an effective
- "1st level off-chip" memory access of time at most "a few" cycles,
- and with clock rates pushing 100 MHz these days for performance, that
- leaves O(10ns) or less for the off-chip memory access.
-
- Now consider building such a (large, fast) memory system, whether for
- a stack machine or for a modern RISC (or for a "classic CISC" like a
- VAX, for that matter). To oversimplify greatly, there are two different
- ways to build a large fast memory system:
- - SRAM-based: buy a lot of fast SRAM and build main memory out of it.
- No need to worry about caches, gives predictable access times so you
- can bound worst-case performance, and relatively simple to design.
- The problem with this type of memory is that fast SRAM costs a *lot*
- of money, dissipates a *lot* of power, and tends to have a rather
- small capacity per chip. (For example, what's the biggest 12ns SRAM
- you can buy 100,000 of, for delivery in 6 weeks? How much does it
- cost? And can you buy a heat sink and cooling fan big enough to
- prevent a meltdown when you power up your blindingly fast memory
- board populated with 64MB of these chips?)
- - DRAM+cache-based: buy a lot of commodity DRAM and build main memory
- out of it, *and* buy a modicum of fast SRAM and build a cache out of
- it. If you get the cache design right, you can still get most of
- the average-case performance of an SRAM-based memory, yet keep the
- the relatively low cost and power dissipation of the DRAMs. (To
- continue the same example, what's the biggest 100ns DRAM you can
- buy 100,000 of, for delivery in 6 weeks? How much does it cost?
- The answers are "probably a lot bigger than 12ns SRAM" and "certainly
- a lot less".)
-
- The cost difference between these two approaches is why all modern
- general-purpose computer systems that worry about performance, use
- cache memories. And that would include stack machines if they were
- to compete in the general-purpose computing marketplace.
-
- - Jonathan Thornburg
- <jonathan@geop.ubc.ca> through mid-November 92, then
- <jonathan@einstein.ph.utexas.edu> or <jonathan@hermes.chpc.utexas.edu>
- [for a few more months] UBC / {Astronomy,Physics}
- [then through Aug/92] U of Texas at Austin / Physics Dept
- / Center for Relativity
-