NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / sys / intel / 2733 < prev next >

Wrap

Internet Message Format | 1992-12-16 | 5.0 KB

Xref: sparky comp.sys.intel:2733 comp.arch:11704 Path: sparky!uunet!pipex!bnr.co.uk!uknet!gdt!aber!aberfa!pcg From: pcg@aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.sys.intel,comp.arch Subject: Re: Superscalar vs. multiple CPUs ? Message-ID: <PCG.92Dec13170504@aberdb.aber.ac.uk> Date: 13 Dec 92 17:05:04 GMT References: <WAYNE.92Dec4093422@backbone.uucp> <37595@cbmvax.commodore.com> <PCG.92Dec9154602@aberdb.aber.ac.uk> <1992Dec10.002951.23336@athena.mit.edu> Sender: news@aber.ac.uk (USENET news service) Reply-To: pcg@aber.ac.uk (Piercarlo Grandi) Organization: Prifysgol Cymru, Aberystwyth Lines: 91 In-Reply-To: solman@athena.mit.edu's message of 10 Dec 92 00: 29:51 GMT Nntp-Posting-Host: aberdb On 10 Dec 92 00:29:51 GMT, solman@athena.mit.edu (Jason W Solinsky) said: Nntp-Posting-Host: m4-035-15.mit.edu solman> (Piercarlo Grandi) writes: |> (Bernard Gunther) said: No, actually I (pcg) said this: pcg> Well, certain tricks can also be used with multiple CPUs on the pcg> same die. And these have an important advantage: as far as I can pcg> see, 6 instruction issue per cycle is virtually pointless. The pcg> *limit* of superscalarity present in general purpose codes is 4, pcg> and actually we are hard pressed to find many codes with pcg> superscalarity higher than 2. solman> Err, I believe those are single threaded codes. If you're only solman> dealing with one thread at a time, then you might as well not solman> bother doing putting multiple CPUs on a die either. Precisely my point: single threaded *general purpose* codes have a limited intrinsic degree of exploitable parallelism . pcg> Naturally if one looks at very regular algorithms one can issue pcg> many, many instructions, *in sequence*, usually. But at that point pcg> one is really better off with a proper vector processor; emulating pcg> a vector processor with a superscalar one is not efficient. solman> It also sounds to me like you are looking at a very narrow solman> subset of programs. If this were true then modern heavily solman> pipelined uPs would be horribly inefficient. Indeed pipeline designs with more than a few stages of pipelining run into huge problems, and are worth doing only if a significant proportion of SIMD-like operation is expected. Pipeline bubbles start to become a significant problem beyond 4 pipeline stages on general purpose codes, even on non superscalar architectures. solman> It seems to me, however, that most programs can take alot of solman> parallelism, sometimes even without multithreading. If you have evidence of this you stand to make a fortune, and I am sure that most computer/compiler vendors would want to hear from you. So far the evidence seems (to me at least) that exploitable degrees of micro (multiple instruction issue in a cycle) and macro (multiple instruction streams in a program) parallelism for most codes don't go above 2-4; only codes with highly regular data reference patterns can be micro/macro parallelized with success/ease. gunther> [ ... on a single chip an internally superscalar/multithreaded gunther> CPU might be a better idea than multiple independent CPUs ... ] gunther> At the ISA level, however, the machine might well appear to be gunther> just a collection of separate CPUs. pcg> Hardly -- there is some problem with multiple contexts and MMUs. solman> These can certainly be taken care of. I may not be current on these issues, I am afraid, but my memories seem to be that this be an open research problem, and an exceedingly hard one at that. solman> A lot of hardware is freed up by breaking the "imaginary" lines. Uhmmm, what I read is that almost all the space of modern 1-3 million CPU chips is taken up by caches and register files. solman> This can then be used to re-implement the lost abilities, like solman> context switching, but in a manner that allows them to be used solman> by all the parts of the chip. Again, my impression that designing a multithreaded CPU looks already a daunting task implies that having the CPU threads execute in different CPU and MMU contexts (thus having many virtual interrupt vectors, register files, traps, and instruction restart/resume on faults) looks harder still. Maybe somebody has already done research along these lines; maybe it is feasible, and maybe it is even cost effective. I can only remember old, very limited, and not too successful examples. What I observe is that designing and implementing proper CPU/MMU context facilities is already hard enough on a single threaded CPU, and becomes even harder in the presence of deep pipelining or superscalar implementations. I am not aware of many modern architectures whose design in this respect is entirely satisfactorily (I read that on the 88k, one of the better designed architectures, coding trap handlers has a lot to do with voodoo), not to mention monsters like the i860. -- Piercarlo Grandi, Dept of CS, PC/UW@Aberystwyth <pcg@aber.ac.uk> E l'italiano cantava, cantava. E le sue disperate invocazioni giunsero alle orecchie del suo divino protettore, il dio della barzelletta