home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!sun-barr!ames!pacbell.com!network.ucsd.edu!nic!davsmith
- From: davsmith@nic.cerf.net (David Smith)
- Newsgroups: comp.arch
- Subject: Re: CISC Microcode (was Re: RISC Mainframe)
- Message-ID: <2373@nic.cerf.net>
- Date: 23 Jul 92 22:37:40 GMT
- References: <Brsx7o.G69@zoo.toronto.edu> <2369@nic.cerf.net> <BruruF.2E4@zoo.toronto.edu>
- Organization: CERFnet
- Lines: 96
-
- In article <BruruF.2E4@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
- >Uh, you are very much behind the times. Few modern CPUs will block waiting
- >for a memory access unless/until they actually need the data. Admittedly,
- >a lot of them can't initiate another memory access meanwhile -- or at
- >least, another data access -- but some can, and buses that support multiple
- >simultaneous memory transactions are old news.
-
- Sigh...The dangers of simplifying for clarity. I didn't feel like getting
- into a long discussion of register scoreboarding and so forth. However,
- if it can't initiate another fetch then as far as the memory system is
- concerned it's hanging on the first one. That's the critical issue for a
- using an interleaved memory system.
-
- Among current microprocessors I don't know of any that are able to
- issue multiple data fetches. I know that SPARC implementations up to
- and including the BIT ECL implementations (66 MHz - I wrote a large
- portion of the simulator for CPU, FPU, cache, interleaved memory system
- and vector processor when I was at FPS, now Cray Research Super Servers
- Inc. The machine currently sells as the SMP. Bitchin' box, cool
- people. Go buy one :-)) do not. The Viking may, but I doubt it can
- issue more than three. The 80x86 and 680x0 chips certainly do not. I
- don't think the R/S 6000 can. How about MIPS, HP-PA and Alpha?
-
- >Moving back to the global issue, it's trivially obvious that you can always
- >build a memory system that a given CPU can't use fully, just by stacking up
- >more and more parallelism in the memory. But next year's CPU will go a
- >long way toward catching up with your memory, and parallelism is expensive.
-
- If it's trivially obvious why did your post state that building high
- bandwidth memory systems was so difficult? Yes, next year's CPU may catch up
- with the memory system. Wouldn't you like to replace the CPU card rather
- than the whole system?
-
- >You don't normally spend lots of money just to justify a pre-established
- >decision to include a DMA box. You buy memory bandwidth to meet your needs.
- >My contention is that making all of it available to your CPU, and choosing
- >a CPU that can use all of it, tends to leave no room for a DMA box to be
- >cost-effective... unless you've got response-time constraints (my item 3)
- >that the CPU can't conveniently meet.
-
- The CPU cannot use the memory bandwidth, therefore it is useless? The
- bandwidth can be used by (gasp) a vector processor, multiple CPU's, I/O
- hardware, etc. Additional memory bandwidth in an interleaved system is
- as easy as adding another card (up to the bus bandwidth, of course).
-
- There are also two memory bandwidths (for a single CPU system) available.
- CPU to cache and main memory. The DMA box can be copying while the CPU is
- working on something in the cache. If normal operations use all the
- memory bus bandwidth then there is not enough memory bandwidth for the
- CPU to do bcopies at its rated speed.
-
- So, what is interleaved memory useful for? High bandwidth operations. High
- speed I/O, e.g. HIPPI. High speed copy operations. Vector processing.
- Multiple CPU's (with correctly designed caches).
-
- Why would you want a DMA box on board? Henry had 4 reasons:
-
- > 1. the CPU can't move data around at full memory speed
- > 2. normal execution doesn't use the full memory bandwidth
- > 3. interrupt overhead is too high for timely device handling
- > 4. bad hardware design cripples CPU data movement
-
- Henry and I seem to agree that item 1 is possible. He thinks that it's
- a bad idea, but it is possible. I say that item 2 is the norm given
- that you're hitting enough in the cache. Item 3, response time
- constraints is really the reason why we build fast computers. Item 4
- is something I hope people won't do.
-
- >No, it is difficult to get full bandwidth out of an interleaved memory
- >system while going through an *uninformed* cache. There is no reason
- >the cache can't vary its fetch size depending on the requirement.
- >Fetching too much is not an issue if you know how much you'll want
- >and can communicate this to the cache. Simply telling it "normal" or
- >"bulk transfer" ought to be adequate.
-
- If you're going to look at the data afterwards this can be useful. If
- you're just copying for alignment reasons prior to sending to a device
- or somesuch, though, you're just kicking good data out of the cache.
- If you get the cache to do your memory-to-memory operation for you
- and not kick data out of the cache you've just renamed the DMA unit.
-
- I think that we'll probably see CPU's designed for multiple outstanding
- fetches (it's not _that_ hard) and that will enable the CPU to do bcopy
- on an interleaved memory system as quickly as it could on a
- non-interleaved system. However, it is also likely that we will be
- building multiple CPU systems and therefore have more memory bandwidth
- than any single CPU can consume. If this is the case then bcopy
- hardware may still have a place if applications become single-threaded
- around bcopy operations. There's also a place for bcopy hardware if
- you decide that the CPU can be doing useful things in its cache while
- the bcopy hardware is out moving things on the memory bus.
-
-
- ====
- David L. Smith
- smithd@discos.com or davsmith@nic.cerf.net
-