NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / sys / intel / 2664 < prev next >

Wrap

Internet Message Format | 1992-12-12 | 5.3 KB

Xref: sparky comp.sys.intel:2664 comp.arch:11587 Path: sparky!uunet!olivea!charnel!rat!zeus! From: mneideng@thidwick.acs.calpoly.edu (Mark Neidengard) Newsgroups: comp.sys.intel,comp.arch Subject: Re: Superscalar vs. multiple CPUs ? Message-ID: <1992Dec12.152224.168173@zeus.calpoly.edu> Date: 12 Dec 92 15:22:24 GMT References: <1992Dec7.012026.11482@athena.mit.edu> <1992Dec8.000357.26577@newsroom.utas.edu.au> <PCG.92Dec9154602@aberdb.aber.ac.uk> Sender: news@zeus.calpoly.edu Organization: Academic Computing Services, Cal Poly San Luis Obispo Lines: 97 In article <PCG.92Dec9154602@aberdb.aber.ac.uk> pcg@aber.ac.uk (Piercarlo Grandi) writes: >On 8 Dec 92 00:03:57 GMT, gunther@bronte.cs.utas.edu.au (Bernard Gunther) said: > >gunther> (Jason W Solinsky) writes: > >gunther> So, are we going to see (for example) four CPUs stuffed in each >gunther> corner of a big die? > >Well, Intel have sort of _promised_/_planned_ such a thing for the late >90s. When you have several dozen million transistors on a chip, you can >start treating it like a micro-PCB -- even Intel processors can only get >so much complex. > >gunther> That implies some unreasonable design choices: *four* I-caches, >gunther> *four* D-caches, four sets on FPUs, four sets of ALUs, etc... > >Unreasonable? And what you are going to use all those transistors for? >Time ago the PC-on-a-chip seemed unreasonable too maybe, but we now have >ti, and palmtop PCs have appeared. Eventually we'll have the palmtop >touchstone delta :-). > >gunther> No, I guess looking at it like that, the idea of laying four >gunther> megacells down on the chip and having them communicate via >gunther> test-and-set semaphores from their individual caches seems >gunther> slightly crazy. It's not that there isn't enough transistors >gunther> to do it some day, but rather that those transistors aren't >gunther> being used very efficiently. > >Uhm? > >gunther> That's fine if chips cost zero and going off-chip extracts no >gunther> penalty and you don't mind competing CPUs running twice times >gunther> as fast with the same hardware budget. :-) > >gunther> By sharing many functional units in a multithreaded CPU, it's >gunther> possible to achieve over 6 instruction issues per cycle with >gunther> 99% memory utilization (no D-cache!) and 97% FPU utilization - >gunther> all from a *single* I-cache and multiple instruction >gunther> sequencers. > >Well, certain tricks can also be used with multiple CPUs on the same >die. And these have an important advantage: as far as I can see, 6 >instruction issue per cycle is virtually pointless. The *limit* of >superscalarity present in general purpose codes is 4, and actually we >are hard pressed to find many codes with superscalarity higher than 2. > >Naturally if one looks at very regular algorithms one can issue many, >many instructions, *in sequence*, usually. But at that point one is >really better off with a proper vector processor; emulating a vector >processor with a superscalar one is not efficient. > >On the other hand there are plenty of non vector problems for which a >coarser grain of parallelism (say 1-10 CPUs) is provenly effective. >Also, considering that future applications (a multimedia movie, where >you have to run in parallel sound and image expansion, and track net >comms and the like) will be probably quite multitasking, multiple CPUs >looks like being a better idea than: > >gunther> At the ISA level, however, the machine might well appear to be >gunther> just a collection of separate CPUs. > >Hardly -- there is some problem with multiple contexts and MMUs. > >gunther> For the cost of two CPUs it's possible to obtain the >gunther> performance of four. Obviously this approach has its limits, >gunther> and it's then when it pays to integrate multiple CPUs. >gunther> I've drawn this line at the point where a single I-cache and a >gunther> single D-cache no longer suffice. > >I'd rather say the divide is between general purpose scalar/superscalar >and vector... > >-- >Piercarlo Grandi | JNET: pcg@uk.ac.aber >Dept of CS, University of Wales | UUCP: ...!aber-cs!pcg >Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk I see nothing wrong with many instruction units on one chip; all you have to do is either A) use them like a sort of on-chip diastolic pipeline, or B) have the system process scheduler schedule different processes to use different units. I could easily envision a chip with three FP adders and three FP multipliers (maybe not with today's fabrication, but in another year or so...) What you would have to do is increase the sophistication of the chip's internal pipeline controller and do a LOT of instruction predecoding. However, if you took a six-unit chip up to 200 MHz, you'd have a single-chip supercomputer; if the i860 was the "supercomputer on a chip", such a high-superscalability chip would be a "parallel-supercomputer on a chip." I could see using 1.2 gigaflops on a singe workstation...=) Mark Neidengard Author of UUSHRED, avail. on wuarchive.wustl.edu in /pub mneideng@cosmos.acs.calpoly.edu "I remember green suits on a black mayor I remember nine-millimeter child slayers I remember all the times that you called me an animal But now I'm walkin' as a Cannibal-" Brother J of the X-Clan