home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.sys.intel:2733 comp.arch:11704
- Path: sparky!uunet!pipex!bnr.co.uk!uknet!gdt!aber!aberfa!pcg
- From: pcg@aber.ac.uk (Piercarlo Grandi)
- Newsgroups: comp.sys.intel,comp.arch
- Subject: Re: Superscalar vs. multiple CPUs ?
- Message-ID: <PCG.92Dec13170504@aberdb.aber.ac.uk>
- Date: 13 Dec 92 17:05:04 GMT
- References: <WAYNE.92Dec4093422@backbone.uucp> <37595@cbmvax.commodore.com>
- <PCG.92Dec9154602@aberdb.aber.ac.uk>
- <1992Dec10.002951.23336@athena.mit.edu>
- Sender: news@aber.ac.uk (USENET news service)
- Reply-To: pcg@aber.ac.uk (Piercarlo Grandi)
- Organization: Prifysgol Cymru, Aberystwyth
- Lines: 91
- In-Reply-To: solman@athena.mit.edu's message of 10 Dec 92 00: 29:51 GMT
- Nntp-Posting-Host: aberdb
-
- On 10 Dec 92 00:29:51 GMT, solman@athena.mit.edu (Jason W Solinsky) said:
- Nntp-Posting-Host: m4-035-15.mit.edu
-
- solman> (Piercarlo Grandi) writes:
- |> (Bernard Gunther) said:
-
- No, actually I (pcg) said this:
-
- pcg> Well, certain tricks can also be used with multiple CPUs on the
- pcg> same die. And these have an important advantage: as far as I can
- pcg> see, 6 instruction issue per cycle is virtually pointless. The
- pcg> *limit* of superscalarity present in general purpose codes is 4,
- pcg> and actually we are hard pressed to find many codes with
- pcg> superscalarity higher than 2.
-
- solman> Err, I believe those are single threaded codes. If you're only
- solman> dealing with one thread at a time, then you might as well not
- solman> bother doing putting multiple CPUs on a die either.
-
- Precisely my point: single threaded *general purpose* codes have a
- limited intrinsic degree of exploitable parallelism .
-
- pcg> Naturally if one looks at very regular algorithms one can issue
- pcg> many, many instructions, *in sequence*, usually. But at that point
- pcg> one is really better off with a proper vector processor; emulating
- pcg> a vector processor with a superscalar one is not efficient.
-
- solman> It also sounds to me like you are looking at a very narrow
- solman> subset of programs. If this were true then modern heavily
- solman> pipelined uPs would be horribly inefficient.
-
- Indeed pipeline designs with more than a few stages of pipelining run
- into huge problems, and are worth doing only if a significant proportion
- of SIMD-like operation is expected. Pipeline bubbles start to become a
- significant problem beyond 4 pipeline stages on general purpose codes,
- even on non superscalar architectures.
-
- solman> It seems to me, however, that most programs can take alot of
- solman> parallelism, sometimes even without multithreading.
-
- If you have evidence of this you stand to make a fortune, and I am sure
- that most computer/compiler vendors would want to hear from you.
-
- So far the evidence seems (to me at least) that exploitable degrees of
- micro (multiple instruction issue in a cycle) and macro (multiple
- instruction streams in a program) parallelism for most codes don't go
- above 2-4; only codes with highly regular data reference patterns can be
- micro/macro parallelized with success/ease.
-
-
- gunther> [ ... on a single chip an internally superscalar/multithreaded
- gunther> CPU might be a better idea than multiple independent CPUs ... ]
- gunther> At the ISA level, however, the machine might well appear to be
- gunther> just a collection of separate CPUs.
-
- pcg> Hardly -- there is some problem with multiple contexts and MMUs.
-
- solman> These can certainly be taken care of.
-
- I may not be current on these issues, I am afraid, but my memories seem
- to be that this be an open research problem, and an exceedingly hard one
- at that.
-
- solman> A lot of hardware is freed up by breaking the "imaginary" lines.
-
- Uhmmm, what I read is that almost all the space of modern 1-3 million
- CPU chips is taken up by caches and register files.
-
- solman> This can then be used to re-implement the lost abilities, like
- solman> context switching, but in a manner that allows them to be used
- solman> by all the parts of the chip.
-
- Again, my impression that designing a multithreaded CPU looks already a
- daunting task implies that having the CPU threads execute in different
- CPU and MMU contexts (thus having many virtual interrupt vectors,
- register files, traps, and instruction restart/resume on faults) looks
- harder still. Maybe somebody has already done research along these
- lines; maybe it is feasible, and maybe it is even cost effective. I can
- only remember old, very limited, and not too successful examples.
-
- What I observe is that designing and implementing proper CPU/MMU context
- facilities is already hard enough on a single threaded CPU, and becomes
- even harder in the presence of deep pipelining or superscalar
- implementations. I am not aware of many modern architectures whose
- design in this respect is entirely satisfactorily (I read that on the
- 88k, one of the better designed architectures, coding trap handlers has
- a lot to do with voodoo), not to mention monsters like the i860.
- --
- Piercarlo Grandi, Dept of CS, PC/UW@Aberystwyth <pcg@aber.ac.uk>
- E l'italiano cantava, cantava. E le sue disperate invocazioni giunsero
- alle orecchie del suo divino protettore, il dio della barzelletta
-