home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.sys.intel:2664 comp.arch:11587
- Path: sparky!uunet!olivea!charnel!rat!zeus!
- From: mneideng@thidwick.acs.calpoly.edu (Mark Neidengard)
- Newsgroups: comp.sys.intel,comp.arch
- Subject: Re: Superscalar vs. multiple CPUs ?
- Message-ID: <1992Dec12.152224.168173@zeus.calpoly.edu>
- Date: 12 Dec 92 15:22:24 GMT
- References: <1992Dec7.012026.11482@athena.mit.edu> <1992Dec8.000357.26577@newsroom.utas.edu.au> <PCG.92Dec9154602@aberdb.aber.ac.uk>
- Sender: news@zeus.calpoly.edu
- Organization: Academic Computing Services, Cal Poly San Luis Obispo
- Lines: 97
-
- In article <PCG.92Dec9154602@aberdb.aber.ac.uk> pcg@aber.ac.uk (Piercarlo Grandi) writes:
- >On 8 Dec 92 00:03:57 GMT, gunther@bronte.cs.utas.edu.au (Bernard Gunther) said:
- >
- >gunther> (Jason W Solinsky) writes:
- >
- >gunther> So, are we going to see (for example) four CPUs stuffed in each
- >gunther> corner of a big die?
- >
- >Well, Intel have sort of _promised_/_planned_ such a thing for the late
- >90s. When you have several dozen million transistors on a chip, you can
- >start treating it like a micro-PCB -- even Intel processors can only get
- >so much complex.
- >
- >gunther> That implies some unreasonable design choices: *four* I-caches,
- >gunther> *four* D-caches, four sets on FPUs, four sets of ALUs, etc...
- >
- >Unreasonable? And what you are going to use all those transistors for?
- >Time ago the PC-on-a-chip seemed unreasonable too maybe, but we now have
- >ti, and palmtop PCs have appeared. Eventually we'll have the palmtop
- >touchstone delta :-).
- >
- >gunther> No, I guess looking at it like that, the idea of laying four
- >gunther> megacells down on the chip and having them communicate via
- >gunther> test-and-set semaphores from their individual caches seems
- >gunther> slightly crazy. It's not that there isn't enough transistors
- >gunther> to do it some day, but rather that those transistors aren't
- >gunther> being used very efficiently.
- >
- >Uhm?
- >
- >gunther> That's fine if chips cost zero and going off-chip extracts no
- >gunther> penalty and you don't mind competing CPUs running twice times
- >gunther> as fast with the same hardware budget. :-)
- >
- >gunther> By sharing many functional units in a multithreaded CPU, it's
- >gunther> possible to achieve over 6 instruction issues per cycle with
- >gunther> 99% memory utilization (no D-cache!) and 97% FPU utilization -
- >gunther> all from a *single* I-cache and multiple instruction
- >gunther> sequencers.
- >
- >Well, certain tricks can also be used with multiple CPUs on the same
- >die. And these have an important advantage: as far as I can see, 6
- >instruction issue per cycle is virtually pointless. The *limit* of
- >superscalarity present in general purpose codes is 4, and actually we
- >are hard pressed to find many codes with superscalarity higher than 2.
- >
- >Naturally if one looks at very regular algorithms one can issue many,
- >many instructions, *in sequence*, usually. But at that point one is
- >really better off with a proper vector processor; emulating a vector
- >processor with a superscalar one is not efficient.
- >
- >On the other hand there are plenty of non vector problems for which a
- >coarser grain of parallelism (say 1-10 CPUs) is provenly effective.
- >Also, considering that future applications (a multimedia movie, where
- >you have to run in parallel sound and image expansion, and track net
- >comms and the like) will be probably quite multitasking, multiple CPUs
- >looks like being a better idea than:
- >
- >gunther> At the ISA level, however, the machine might well appear to be
- >gunther> just a collection of separate CPUs.
- >
- >Hardly -- there is some problem with multiple contexts and MMUs.
- >
- >gunther> For the cost of two CPUs it's possible to obtain the
- >gunther> performance of four. Obviously this approach has its limits,
- >gunther> and it's then when it pays to integrate multiple CPUs.
- >gunther> I've drawn this line at the point where a single I-cache and a
- >gunther> single D-cache no longer suffice.
- >
- >I'd rather say the divide is between general purpose scalar/superscalar
- >and vector...
- >
- >--
- >Piercarlo Grandi | JNET: pcg@uk.ac.aber
- >Dept of CS, University of Wales | UUCP: ...!aber-cs!pcg
- >Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk
-
- I see nothing wrong with many instruction units on one chip; all you have to
- do is either A) use them like a sort of on-chip diastolic pipeline, or B)
- have the system process scheduler schedule different processes to use
- different units. I could easily envision a chip with three FP adders and
- three FP multipliers (maybe not with today's fabrication, but in another
- year or so...) What you would have to do is increase the sophistication of
- the chip's internal pipeline controller and do a LOT of instruction
- predecoding. However, if you took a six-unit chip up to 200 MHz, you'd have
- a single-chip supercomputer; if the i860 was the "supercomputer on a chip",
- such a high-superscalability chip would be a "parallel-supercomputer on a
- chip." I could see using 1.2 gigaflops on a singe workstation...=)
-
- Mark Neidengard Author of UUSHRED, avail. on
- wuarchive.wustl.edu in /pub
- mneideng@cosmos.acs.calpoly.edu
- "I remember green suits on a black mayor
- I remember nine-millimeter child slayers
- I remember all the times that you called me an animal
- But now I'm walkin' as a Cannibal-"
- Brother J of the X-Clan
-