home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!charon.amdahl.com!pacbell.com!decwrl!spool.mu.edu!darwin.sura.net!jvnc.net!yale.edu!qt.cs.utexas.edu!cs.utexas.edu!sun-barr!news2me.EBay.Sun.COM!exodus.Eng.Sun.COM!flayout.Eng.Sun.COM!tremblay
- From: tremblay@flayout.Eng.Sun.COM (Marc Tremblay)
- Newsgroups: comp.arch
- Subject: Re: Interleaving the caches like memory? Newsgroups: comp.arch
- Date: 5 Nov 1992 17:30:14 GMT
- Organization: Sun Microsystems, Mt. View, Ca.
- Lines: 28
- Message-ID: <lfimh6INN671@exodus.Eng.Sun.COM>
- References: <1992Nov3.131105.21763@klaava.Helsinki.FI>
- NNTP-Posting-Host: flayout
- Keywords: memory cache, associative memory, searching, sorting
-
- In article <1992Nov3.131105.21763@klaava.Helsinki.FI> veijalai@klaava.Helsinki.FI (Tony Veijalainen) writes:
- >Memory on high speed microcomputers is divided in multiple regions
- >according to <address> mod <interleave factor>. Is it sensible to have
- >them cached with separate cache controller and mayby raise the frequency
- >of accessing sequential locations over the physical limits of sequential
- >access of one cache chip. Second possibility would be having cache to
- >spy branch instructions like modern CPU:s and have parallel pre-cached
- >instruction pathes instead of instruction pipelines like CPU. This
- >sounds sensible to me.
-
- One can certainly have multiple banks for the on-chip caches as well as
- the off-chip caches. For instance, a superscalar processor can support 2 loads
- per cycle by implementing the data cache through two or more banks. This is cheaper
- than dual-porting the whole cache. For general purpose code, collisions do
- occur between the load addresses but for a reasonable number of banks the
- percentage is low which justifies the use of banks.
-
- One issue that comes across for deeply pipelined machines is that for addresses
- computed as a sum of two values, it is not known "up-front" (before register file access)
- if the addresses will collide or not. Generating stalls a few cycles deep into
- the pipeline is not desirable since it increases the CPI (versus scheduling up-front).
- If the caches are interleaved using some low order bits then a short addition
- done after the register file access is enough. Notice that even if the cache indexes
- required translated bits (e.g. when page size * associativity < cache size) this can
- still work since the low order bits are left alone.
-
- - Marc Tremblay.
- Sun Microsystems.
-