NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / sys / transput / 850 next >

Wrap

Text File | 1992-07-21 | 3.9 KB | 91 lines

Newsgroups: comp.sys.transputer Path: sparky!uunet!inmos!inmos.co.uk!gimli!roger From: roger@gimli.inmos.co.uk (Roger Shepherd) Subject: Re: Block moves (MOVE, MOVE2D) and the cache on the T9K Message-ID: <1992Jul21.114753.6065@inmos.co.uk> Sender: roger@gimli (Roger Shepherd) Organization: INMOS Limited, Bristol, UK References: <JAN.92Jul18223618@pallas.neuroinformatik.ruhr-uni-bochum.de> Date: Tue, 21 Jul 1992 11:47:53 GMT Lines: 79 In article <JAN.92Jul18223618@pallas.neuroinformatik.ruhr-uni-bochum.de>, jan@pallas.neuroinformatik.ruhr-uni-bochum.de (Jan Vorbrueggen) writes: >> There was some discussion on comp.arch recently on the appriateness >> of putting block moves into the instruction set of a processor. >> I've always thought that the transputer did it right, especially >> by providing MOVE2D, which comes in handy in many places. >> >> Of course, the implementation is no problem on the current chips, >> as they don't have a cache. But how does this work on the T9K? It is implemented on the T9000 in a straightforward manner; no optimisation of the operation is performed to exploit/use the cache. >> Will my MOVE2D, scattering the 16384 bytes of a sequentially >> stored array into the proper places of a two dimensional one, i.e. >> >> [16384] BYTE a : >> [16384][40] BYTE x : >> SEQ i=0 FOR 16384 >> x [i][3] := a[i] >> >> implemented with a single MOVE2D, fill all of the cache with all those >> unnecessary bytes? Which are the unnecessary bytes? The T9000 has 16 kbytes of cache. However, it does not cache 16k bytes, it caches 4 (banks) x 256 (lines) x 16 bytes, and each line is 16 byte (4 word) alogned. Your example is worse than you suspect; you actually perform writes to 16k distinct lines - 16 times the size of the cache. At the end of the execution of your example the cache will probably contain some of the a array and some of the x array. It is unlikely that any of the initial state of the cache will remain. If your example were moderate; writing to 1024 bytes of cache, then you would find that about 34% of the original cache content would remain. This shows one difference in the behaviour of a random replacement cache (as in the T9000) and an LRU or FIFO replacement cache where the whole cache would have been destroyed by even this smaller example. >> Or can the CPU tell the cache controller in this >> and similar cases "please don't cache these"? No. But if the x array were something like a frame store, then you might have chosen to mark that bank of stroe as uncached, in which case the byte writes would be made to the external memory, and the cache would not be affected. (See my next comment). >> Hmm, actually this doesn't >> seem quite possible in this case, as the destination words must be read >> to be able to insert the bytes... The T9000 does have byte write signals. If the external memory is uncached then these are used to perform part-word writes, rather than using a read-modify write. 64-bit wide external memory can only be used cached - in this case some write-modify-write operation has to take place. well, at least some optimizations >> on the microcode level are theoretically possible...and thrashing the >> cache could certainly be avoided when doing a gather. The read accesses in your example use the cache very well. You have sequential access through the array a. Each cache miss will cause 16-bytes of data to be fetched - all of which will be used in subsequent iterations of the copy. >> >> Can anybody provide an answer to this arcane question :) ? >> I hope I have. -- Roger Shepherd, INMOS Ltd JANET: roger@uk.co.inmos 1000 Aztec West UUCP: ukc!inmos!roger or uunet!inmos-c!roger Almondsbury INTERNET: roger@inmos.com +44 454 616616 ROW: roger@inmos.com OR roger@inmos.co.uk