home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.arch:11706 comp.sys.intel:2735
- Path: sparky!uunet!mcsun!uknet!gdt!aber!aberfa!pcg
- From: pcg@aber.ac.uk (Piercarlo Grandi)
- Newsgroups: comp.arch,comp.sys.intel
- Subject: Re: Superscalar vs. multiple CPUs ?
- Message-ID: <PCG.92Dec11162630@aberdb.aber.ac.uk>
- Date: 11 Dec 92 16:26:30 GMT
- References: <1992Dec7.012026.11482@athena.mit.edu>
- <1992Dec8.000357.26577@newsroom.utas.edu.au>
- <PCG.92Dec9154602@aberdb.aber.ac.uk>
- <1992Dec9.211737.23911@walter.cray.com>
- Sender: news@aber.ac.uk (USENET news service)
- Reply-To: pcg@aber.ac.uk (Piercarlo Grandi)
- Organization: Prifysgol Cymru, Aberystwyth
- Lines: 57
- In-Reply-To: bradc@ferris.cray.com's message of 10 Dec 92 03: 17:36 GMT
- Nntp-Posting-Host: aberdb
-
- On 10 Dec 92 03:17:36 GMT, bradc@ferris.cray.com (Bradley R. Carlile) said:
-
- pcg> as far as I can see, 6 instruction issue per cycle is virtually
- pcg> pointless. The *limit* of superscalarity present in general purpose
- ^^^^^^^^^^^^^^^
-
- Note the "general purpose"; if the codes exhibit high regularity in data
- access patterns then they are no longer "general purpose" codes, at
- least in my understanding of that term, which encompasses things like
- editors, databases, compilers, word processors, spreadsheets, ...
-
- pcg> codes is 4, and actually we are hard pressed to find many codes
- pcg> with superscalarity higher than 2.
-
- bradc> The limit of 4 or 6 may be the limit programming a superscalar
- bradc> chip like by simply letting the the chip group instructions
- bradc> together. *However* if one uses the technique of software
- bradc> pipelining like we used to use on the VLIW machines of yesteryear
- bradc> FPS-120B (before 1976), FPS-164, and the FPS-264 (1985?).
-
- Ah yes, well known, for codes well suited to SIMD style computing. But
- frankly the jury is still out on software pipelining even in that case.
- It covers, like LIW/VLIW, a gray area between [super]scalar and vector,
- one maybe suited to short vector lengths.
-
- bradc> These machines could issue up to 10 instruction every cycle {I
- bradc> wrote software for 7 years that used these instructions}.
-
- But I still think that a proper vector processor is overall, except for
- particular cases, a better bet, especially because memory queues fit in
- more easily with a vector architecture than with everything else, and
- SIMD-style codes are great bandwidth eaters. And one can design vector
- architectures that do perform well even for the short vector length for
- which a LIW/VLIW seems designed.
-
- Also, If you can put to good use software pipelining, and issue 10
- instructions per cycle, this means you have the same order of magnitude
- memory transactions, and you need the same order of magnitude register
- file depths, and so on. This looks like calling for vector instructions,
- vector registers, vector memory access.
-
- bradc> In addition, compilers can automatically perform software
- bradc> pipelining. At FPS we had several. A "modern" example of a
- bradc> compiler includes Portland Group Inc.'s i860 compiler.
-
- Fascinating compilers, but the i860 story (which is often considered a
- sort of LIW architecture) seems to support my impressions: it would have
- been easier, even for the limited degree of parallelism available in the
- i860, to have proper vector style instructions; the ones that do exploit
- the multiple functional units of the i860 really amount to such, only
- with loads of complications and hazards, and difficulties with keeping
- the pipes fed. Now the i860 is a particularly awkward quasi-LIW design,
- and it should not be taken as a straw man, but still...
- --
- Piercarlo Grandi | JNET: pcg@uk.ac.aber
- Dept of CS, University of Wales | UUCP: ...!aber-cs!pcg
- Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk
-