NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / arch / 10420 < prev next >

Wrap

Internet Message Format | 1992-11-05 | 2.3 KB

Path: sparky!uunet!charon.amdahl.com!pacbell.com!decwrl!spool.mu.edu!darwin.sura.net!jvnc.net!yale.edu!qt.cs.utexas.edu!cs.utexas.edu!sun-barr!news2me.EBay.Sun.COM!exodus.Eng.Sun.COM!flayout.Eng.Sun.COM!tremblay From: tremblay@flayout.Eng.Sun.COM (Marc Tremblay) Newsgroups: comp.arch Subject: Re: Interleaving the caches like memory? Newsgroups: comp.arch Date: 5 Nov 1992 17:30:14 GMT Organization: Sun Microsystems, Mt. View, Ca. Lines: 28 Message-ID: <lfimh6INN671@exodus.Eng.Sun.COM> References: <1992Nov3.131105.21763@klaava.Helsinki.FI> NNTP-Posting-Host: flayout Keywords: memory cache, associative memory, searching, sorting In article <1992Nov3.131105.21763@klaava.Helsinki.FI> veijalai@klaava.Helsinki.FI (Tony Veijalainen) writes: >Memory on high speed microcomputers is divided in multiple regions >according to <address> mod <interleave factor>. Is it sensible to have >them cached with separate cache controller and mayby raise the frequency >of accessing sequential locations over the physical limits of sequential >access of one cache chip. Second possibility would be having cache to >spy branch instructions like modern CPU:s and have parallel pre-cached >instruction pathes instead of instruction pipelines like CPU. This >sounds sensible to me. One can certainly have multiple banks for the on-chip caches as well as the off-chip caches. For instance, a superscalar processor can support 2 loads per cycle by implementing the data cache through two or more banks. This is cheaper than dual-porting the whole cache. For general purpose code, collisions do occur between the load addresses but for a reasonable number of banks the percentage is low which justifies the use of banks. One issue that comes across for deeply pipelined machines is that for addresses computed as a sum of two values, it is not known "up-front" (before register file access) if the addresses will collide or not. Generating stalls a few cycles deep into the pipeline is not desirable since it increases the CPI (versus scheduling up-front). If the caches are interleaved using some low order bits then a short addition done after the register file access is enough. Notice that even if the cache indexes required translated bits (e.g. when page size * associativity < cache size) this can still work since the low order bits are left alone. - Marc Tremblay. Sun Microsystems.