NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / arch / 11706 < prev next >

Wrap

Internet Message Format | 1992-12-16 | 3.6 KB

Xref: sparky comp.arch:11706 comp.sys.intel:2735 Path: sparky!uunet!mcsun!uknet!gdt!aber!aberfa!pcg From: pcg@aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.arch,comp.sys.intel Subject: Re: Superscalar vs. multiple CPUs ? Message-ID: <PCG.92Dec11162630@aberdb.aber.ac.uk> Date: 11 Dec 92 16:26:30 GMT References: <1992Dec7.012026.11482@athena.mit.edu> <1992Dec8.000357.26577@newsroom.utas.edu.au> <PCG.92Dec9154602@aberdb.aber.ac.uk> <1992Dec9.211737.23911@walter.cray.com> Sender: news@aber.ac.uk (USENET news service) Reply-To: pcg@aber.ac.uk (Piercarlo Grandi) Organization: Prifysgol Cymru, Aberystwyth Lines: 57 In-Reply-To: bradc@ferris.cray.com's message of 10 Dec 92 03: 17:36 GMT Nntp-Posting-Host: aberdb On 10 Dec 92 03:17:36 GMT, bradc@ferris.cray.com (Bradley R. Carlile) said: pcg> as far as I can see, 6 instruction issue per cycle is virtually pcg> pointless. The *limit* of superscalarity present in general purpose ^^^^^^^^^^^^^^^ Note the "general purpose"; if the codes exhibit high regularity in data access patterns then they are no longer "general purpose" codes, at least in my understanding of that term, which encompasses things like editors, databases, compilers, word processors, spreadsheets, ... pcg> codes is 4, and actually we are hard pressed to find many codes pcg> with superscalarity higher than 2. bradc> The limit of 4 or 6 may be the limit programming a superscalar bradc> chip like by simply letting the the chip group instructions bradc> together. *However* if one uses the technique of software bradc> pipelining like we used to use on the VLIW machines of yesteryear bradc> FPS-120B (before 1976), FPS-164, and the FPS-264 (1985?). Ah yes, well known, for codes well suited to SIMD style computing. But frankly the jury is still out on software pipelining even in that case. It covers, like LIW/VLIW, a gray area between [super]scalar and vector, one maybe suited to short vector lengths. bradc> These machines could issue up to 10 instruction every cycle {I bradc> wrote software for 7 years that used these instructions}. But I still think that a proper vector processor is overall, except for particular cases, a better bet, especially because memory queues fit in more easily with a vector architecture than with everything else, and SIMD-style codes are great bandwidth eaters. And one can design vector architectures that do perform well even for the short vector length for which a LIW/VLIW seems designed. Also, If you can put to good use software pipelining, and issue 10 instructions per cycle, this means you have the same order of magnitude memory transactions, and you need the same order of magnitude register file depths, and so on. This looks like calling for vector instructions, vector registers, vector memory access. bradc> In addition, compilers can automatically perform software bradc> pipelining. At FPS we had several. A "modern" example of a bradc> compiler includes Portland Group Inc.'s i860 compiler. Fascinating compilers, but the i860 story (which is often considered a sort of LIW architecture) seems to support my impressions: it would have been easier, even for the limited degree of parallelism available in the i860, to have proper vector style instructions; the ones that do exploit the multiple functional units of the i860 really amount to such, only with loads of complications and hazards, and difficulties with keeping the pipes fed. Now the i860 is a particularly awkward quasi-LIW design, and it should not be taken as a straw man, but still... -- Piercarlo Grandi | JNET: pcg@uk.ac.aber Dept of CS, University of Wales | UUCP: ...!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk