NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / arch / 11705 < prev next >

Wrap

Internet Message Format | 1992-12-16 | 3.1 KB

Xref: sparky comp.arch:11705 comp.sys.intel:2734 Path: sparky!uunet!pipex!bnr.co.uk!uknet!gdt!aber!aberfa!pcg From: pcg@aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.arch,comp.sys.intel Subject: Re: Superscalar vs. multiple CPUs ? Message-ID: <PCG.92Dec13164348@aberdb.aber.ac.uk> Date: 13 Dec 92 16:43:48 GMT References: <1992Dec7.012026.11482@athena.mit.edu> <1992Dec8.000357.26577@newsroom.utas.edu.au> <PCG.92Dec9154602@aberdb.aber.ac.uk> <1992Dec9.211737.23911@walter.cray.com> <Bz25n0.202@metaflow.com> Sender: news@aber.ac.uk (USENET news service) Reply-To: pcg@aber.ac.uk (Piercarlo Grandi) Organization: Prifysgol Cymru, Aberystwyth Lines: 47 In-Reply-To: rschnapp@metaflow.com's message of 10 Dec 92 19: 18:35 GMT Nntp-Posting-Host: aberdb On 10 Dec 92 19:18:35 GMT, rschnapp@metaflow.com (Russ Schnapp) said: rschnapp> (Bradley R. Carlile) writes: Bradley> The limit of 4 or 6 may be the limit programming a superscalar Bradley> chip like by simply letting the the chip group instructions Bradley> together. *However* if one uses the technique of software Bradley> pipelining like we used to use on the VLIW machines of Bradley> yesteryear FPS-120B (before 1976), FPS-164, and the FPS-264 Bradley> (1985?). These machines could issue up to 10 instruction every Bradley> cycle {I wrote software for 7 years that used these Bradley> instructions}. rschnapp> Don't forget about out-of-order (i.e., dataflow) execution rschnapp> techniques. You can gain plenty of additional fine-grain rschnapp> parallelism. Ah, there are indeed many interesting tricks to support high levels of micro parallelism *in the implementation*. To me the really interesting problem is not how to do it, though, it is whether it is worth doing it, and for which classes of codes. I have yet to find evidence that *general purpose* codes have an intrinsic degree of micro parallelizable operation (let's call it superscalarity) that makes it worthwhile to have a degree of parallel *instruction issue* within a single stream not greater than 2-4. The codes that can be easily micro-parallelized are usually *special purpose*, in that they have a structure (e.g. FIFO reference patterns) that are best suited to a SIMD/vector processor approach. Otherwise *general purpose* codes have a degree of macro parallelism (let's call it multithreading) that can be best exploited with multiple CPUs, and that can exploited up to a degree of parallel *instruction streams* with independent contexts not greater than 2-4 (again). Then are are codes that can be easily macro-parallelized, and these are again *special purpose*, in that they have a structure (e.g. LIFO reference patterns) that are best suited to a MIMD approach. Naturally all this depends on how you define a "code"; for example a timesharing system might be considered a single, highly parallelizable code, or as a collection of fairly hard to parallelize different codes. -- Piercarlo Grandi, Dept of CS, PC/UW@Aberystwyth <pcg@aber.ac.uk> E l'italiano cantava, cantava. E le sue disperate invocazioni giunsero alle orecchie del suo divino protettore, il dio della barzelletta