home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.transputer
- Path: sparky!uunet!inmos!inmos.co.uk!gimli!roger
- From: roger@gimli.inmos.co.uk (Roger Shepherd)
- Subject: Re: Block moves (MOVE, MOVE2D) and the cache on the T9K
- Message-ID: <1992Jul21.114753.6065@inmos.co.uk>
- Sender: roger@gimli (Roger Shepherd)
- Organization: INMOS Limited, Bristol, UK
- References: <JAN.92Jul18223618@pallas.neuroinformatik.ruhr-uni-bochum.de>
- Date: Tue, 21 Jul 1992 11:47:53 GMT
- Lines: 79
-
- In article <JAN.92Jul18223618@pallas.neuroinformatik.ruhr-uni-bochum.de>,
- jan@pallas.neuroinformatik.ruhr-uni-bochum.de (Jan Vorbrueggen) writes:
-
- >> There was some discussion on comp.arch recently on the appriateness
- >> of putting block moves into the instruction set of a processor.
- >> I've always thought that the transputer did it right, especially
- >> by providing MOVE2D, which comes in handy in many places.
- >>
- >> Of course, the implementation is no problem on the current chips,
- >> as they don't have a cache. But how does this work on the T9K?
-
- It is implemented on the T9000 in a straightforward manner; no
- optimisation of the operation is performed to exploit/use the cache.
-
- >> Will my MOVE2D, scattering the 16384 bytes of a sequentially
- >> stored array into the proper places of a two dimensional one, i.e.
- >>
- >> [16384] BYTE a :
- >> [16384][40] BYTE x :
- >> SEQ i=0 FOR 16384
- >> x [i][3] := a[i]
- >>
- >> implemented with a single MOVE2D, fill all of the cache with all those
- >> unnecessary bytes?
-
- Which are the unnecessary bytes?
-
- The T9000 has 16 kbytes of cache. However, it does not cache 16k bytes,
- it caches 4 (banks) x 256 (lines) x 16 bytes, and each line is 16 byte
- (4 word) alogned. Your example is worse than you suspect; you actually
- perform writes to 16k distinct lines - 16 times the size of the cache.
- At the end of the execution of your example the cache will probably contain
- some of the a array and some of the x array. It is unlikely that any of the
- initial state of the cache will remain.
-
- If your example were moderate; writing to 1024 bytes of cache, then
- you would find that about 34% of the original cache content would remain.
- This shows one difference in the behaviour of a random replacement cache
- (as in the T9000) and an LRU or FIFO replacement cache where the whole cache
- would have been destroyed by even this smaller example.
-
- >> Or can the CPU tell the cache controller in this
- >> and similar cases "please don't cache these"?
-
- No. But if the x array were something like a frame store, then you might
- have chosen to mark that bank of stroe as uncached, in which case the
- byte writes would be made to the external memory, and the cache would
- not be affected. (See my next comment).
-
- >> Hmm, actually this doesn't
- >> seem quite possible in this case, as the destination words must be read
- >> to be able to insert the bytes...
-
- The T9000 does have byte write signals. If the external memory is
- uncached then these are used to perform part-word writes, rather than
- using a read-modify write. 64-bit wide external memory can only be
- used cached - in this case some write-modify-write operation has to take
- place.
-
- well, at least some optimizations
- >> on the microcode level are theoretically possible...and thrashing the
- >> cache could certainly be avoided when doing a gather.
-
- The read accesses in your example use the cache very well. You have
- sequential access through the array a. Each cache miss will cause
- 16-bytes of data to be fetched - all of which will be used in subsequent
- iterations of the copy.
-
- >>
- >> Can anybody provide an answer to this arcane question :) ?
- >>
-
- I hope I have.
-
- --
- Roger Shepherd, INMOS Ltd JANET: roger@uk.co.inmos
- 1000 Aztec West UUCP: ukc!inmos!roger or uunet!inmos-c!roger
- Almondsbury INTERNET: roger@inmos.com
- +44 454 616616 ROW: roger@inmos.com OR roger@inmos.co.uk
-