home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!wupost!darwin.sura.net!mips!swrinde!cs.utexas.edu!loyola!ross.COM!sparhawk!mitch
- From: mitch@sparhawk.com (Mitch Alsup)
- Newsgroups: comp.arch
- Subject: Caches and Hashing
- Summary: HP Hashing direct mapped caches probably wins
- Message-ID: <1992Aug18.182506.23744@ross.COM>
- Date: 18 Aug 92 18:25:06 GMT
- Sender: news@ross.COM
- Distribution: usa
- Organization: Mitch Alsup Consulting
- Lines: 84
- Originator: mitch@sparhawk
- Nntp-Posting-Host: sparhawk
-
- john@netnews.jhuapl.edu (John Hayes): Writes:
-
- > There was an interesting idea in a recent article in Microprocessor
- > Report. The article was describing HP's PA-RISC. Here is what caught
- > my eye:
- > "The caches are direct mapped to eliminate the performance
- > impact of multiplexers that would be required with set-
- > associativity. To reduce the cache miss rates, the addresses
- > are hashed before they drive the cache SRAMS."*
-
- > This raises several questions in my mind:
- > 1) How much does hashing improve the miss rate of a direct-mapped
- > cache?
-
- As usual, your milage may vary due to: Application, O/S, cache size.
- With 65KByte sized caches, many applications see only 5% miss rate
- (and with a 10 cycle miss penalty they achieve only 50% machine
- peak performance). With 1MByte sized caches one could expect ~1%
- miss rate and therefore greater than 90% machine peak performance.
-
- Excellent Hashing can be expected to achieve 40% improvement (max--
- Moyers rule of SQRT(2)-1 for performance gains). This could allow
- the 65KB cache to achieve 3% miss rates and 70% peak performance;
- or the 1MB cache to achieve 94% machine performance. Considerable
- gains in the first case and useful in the second. The 65KB cache
- PROBABLY would benefit LESS than the larger cache because it is
- operating closed to its intrinsic COMPUSOLRY miss rate (it is smaller)
- while the 1MB cache could benefit more IF the WORKING SET of the
- applicationS fit in the cache. Also note that the difference in the
- miss rate for the larger cache in direct mapped configruation to
- a set associative configuration is considerably smaller than that
- of the smaller cache.
-
- > 2) The argument for direct mapped caches is that they have a better
- > effective access time than set-associative caches because direct
- > mapped caches do not need multiplexers to select the data. Would
- > the cost of hashing eliminate this advantage? In other words:
- > would a direct mapped cache with hashing or a set-associative
- > cache (with associated multiplexers) give the best performance?
-
- If one measures performance in nanoseconds per delivery of datums,
- it is quite possible that hashed accesses in direct mapped caches
- will perform better than set associative caches. In HP's case, this
- is reinforced due to thier use of commodity SRAMS external to the
- processor, and cannot, therefore, dink with the decoder to the ram
- arrays. This should also be the case for internal VLSI ram arrays where
- the decoder can be set up to dink with the addresses (hash) with very
- little additional delay (100's of picoseconds). Conversely, the
- output buffers can be setup to dink with the addresses driven to the
- SRAMs and incur very little additional delay (.5 ns).
-
- On the other hand, if the TAG section is designed so that it has faster
- access times than the Data section of a VLSI-like cache, the tag access
- and set comparison can overlap the data access. With this arrangement,
- the set selection can be driven as the data is sensed, and contribute
- negligably to the access time. (88200 uses this trick). This requires
- that the tag section be about 4 gate delays faster than the data section.
- Careful SPICE modeling is REQUIRED. Your mileage may vary. This trick
- does not work with commodity SRAM chips because the access time for the
- tags is commensurate with the access times for the data. In any event,
- the gain from set associative caches dimminishes with large external
- caches (greater than 65KBytes).
-
- > 3) What hash function does HP use?
-
- Hopefully, one which places user code and user data in different cache
- locations than supervisor code and data. If they also attempt to place
- I/O pages and data base pages separately from these others, then I can
- see considerable gain from hasing. On the other hand, why cannot the
- O/S and Linker conspire to place the pages in physical and logical locations
- so as to minimize these problems up front?
-
- > I would be interestered to hear from anyone who knows the answer
- > to these questions from papers or first-hand knowledge. I will
- > summarize to the net.
-
- > * Case, Brian, "HP Reveals Superscalar PA-RISC Implementation",
- > Microprocessor Report, Vol. 6, No. 4, March 25, 1992, p. 17.
-
- -------------------------------------------------------------------------------------
- Mitch Alsup Currently at: mitch@ross.com An Independent Consultant
- (512)-263-5086 Evenings
- Disclaimer: I speak for myself.
- -------------------------------------------------------------------------------------
-