NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / arch / 8251 < prev next >

Wrap

Internet Message Format | 1992-07-23 | 5.5 KB

Path: sparky!uunet!sun-barr!ames!pacbell.com!network.ucsd.edu!nic!davsmith From: davsmith@nic.cerf.net (David Smith) Newsgroups: comp.arch Subject: Re: CISC Microcode (was Re: RISC Mainframe) Message-ID: <2373@nic.cerf.net> Date: 23 Jul 92 22:37:40 GMT References: <Brsx7o.G69@zoo.toronto.edu> <2369@nic.cerf.net> <BruruF.2E4@zoo.toronto.edu> Organization: CERFnet Lines: 96 In article <BruruF.2E4@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >Uh, you are very much behind the times. Few modern CPUs will block waiting >for a memory access unless/until they actually need the data. Admittedly, >a lot of them can't initiate another memory access meanwhile -- or at >least, another data access -- but some can, and buses that support multiple >simultaneous memory transactions are old news. Sigh...The dangers of simplifying for clarity. I didn't feel like getting into a long discussion of register scoreboarding and so forth. However, if it can't initiate another fetch then as far as the memory system is concerned it's hanging on the first one. That's the critical issue for a using an interleaved memory system. Among current microprocessors I don't know of any that are able to issue multiple data fetches. I know that SPARC implementations up to and including the BIT ECL implementations (66 MHz - I wrote a large portion of the simulator for CPU, FPU, cache, interleaved memory system and vector processor when I was at FPS, now Cray Research Super Servers Inc. The machine currently sells as the SMP. Bitchin' box, cool people. Go buy one :-)) do not. The Viking may, but I doubt it can issue more than three. The 80x86 and 680x0 chips certainly do not. I don't think the R/S 6000 can. How about MIPS, HP-PA and Alpha? >Moving back to the global issue, it's trivially obvious that you can always >build a memory system that a given CPU can't use fully, just by stacking up >more and more parallelism in the memory. But next year's CPU will go a >long way toward catching up with your memory, and parallelism is expensive. If it's trivially obvious why did your post state that building high bandwidth memory systems was so difficult? Yes, next year's CPU may catch up with the memory system. Wouldn't you like to replace the CPU card rather than the whole system? >You don't normally spend lots of money just to justify a pre-established >decision to include a DMA box. You buy memory bandwidth to meet your needs. >My contention is that making all of it available to your CPU, and choosing >a CPU that can use all of it, tends to leave no room for a DMA box to be >cost-effective... unless you've got response-time constraints (my item 3) >that the CPU can't conveniently meet. The CPU cannot use the memory bandwidth, therefore it is useless? The bandwidth can be used by (gasp) a vector processor, multiple CPU's, I/O hardware, etc. Additional memory bandwidth in an interleaved system is as easy as adding another card (up to the bus bandwidth, of course). There are also two memory bandwidths (for a single CPU system) available. CPU to cache and main memory. The DMA box can be copying while the CPU is working on something in the cache. If normal operations use all the memory bus bandwidth then there is not enough memory bandwidth for the CPU to do bcopies at its rated speed. So, what is interleaved memory useful for? High bandwidth operations. High speed I/O, e.g. HIPPI. High speed copy operations. Vector processing. Multiple CPU's (with correctly designed caches). Why would you want a DMA box on board? Henry had 4 reasons: > 1. the CPU can't move data around at full memory speed > 2. normal execution doesn't use the full memory bandwidth > 3. interrupt overhead is too high for timely device handling > 4. bad hardware design cripples CPU data movement Henry and I seem to agree that item 1 is possible. He thinks that it's a bad idea, but it is possible. I say that item 2 is the norm given that you're hitting enough in the cache. Item 3, response time constraints is really the reason why we build fast computers. Item 4 is something I hope people won't do. >No, it is difficult to get full bandwidth out of an interleaved memory >system while going through an *uninformed* cache. There is no reason >the cache can't vary its fetch size depending on the requirement. >Fetching too much is not an issue if you know how much you'll want >and can communicate this to the cache. Simply telling it "normal" or >"bulk transfer" ought to be adequate. If you're going to look at the data afterwards this can be useful. If you're just copying for alignment reasons prior to sending to a device or somesuch, though, you're just kicking good data out of the cache. If you get the cache to do your memory-to-memory operation for you and not kick data out of the cache you've just renamed the DMA unit. I think that we'll probably see CPU's designed for multiple outstanding fetches (it's not _that_ hard) and that will enable the CPU to do bcopy on an interleaved memory system as quickly as it could on a non-interleaved system. However, it is also likely that we will be building multiple CPU systems and therefore have more memory bandwidth than any single CPU can consume. If this is the case then bcopy hardware may still have a place if applications become single-threaded around bcopy operations. There's also a place for bcopy hardware if you decide that the CPU can be doing useful things in its cache while the bcopy hardware is out moving things on the memory bus. ==== David L. Smith smithd@discos.com or davsmith@nic.cerf.net