home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!dtix!darwin.sura.net!mips!sdd.hp.com!hpscdc!hplextra!hpcc05!aspen!morris
- From: morris@aspen.NSA.HP.COM (Dale Morris)
- Newsgroups: comp.arch
- Subject: Re: Caches and Hashing
- Message-ID: <1360024@aspen.NSA.HP.COM>
- Date: 19 Aug 92 17:33:47 GMT
- References: <Bt5DKp.z0@netnews.jhuapl.edu>
- Organization: HP Networked Systems Architecture - Cupertino, CA
- Lines: 109
-
-
- John Hayes writes:
-
- > There was an interesting idea in a recent article in Microprocessor
- > Report. The article was describing HP's PA-RISC. Here is what caught
- > my eye:
- > "The caches are direct mapped to eliminate the performance
- > impact of multiplexers that would be required with set-
- > associativity. To reduce the cache miss rates, the addresses
- > are hashed before they drive the cache SRAMS."*
- >
- > This raises several questions in my mind:
- > 1) How much does hashing improve the miss rate of a direct-mapped
- > cache?
-
- Let me describe the rationale behind this. The approaches people take
- to improving system performance can often be deeply intertwingled, so
- pardon me if some of this might appear "obvious". Design context is
- important in evaluating something like this.
-
- One of the working assumptions of caches is that, although there is
- temporal and spacial locality in the stream of memory references, on
- the whole, these references tend to cover the cache's index space
- somewhat uniformly. This means that for a cache that is not fully
- associative, each of the sets (or each of the lines in a direct-mapped
- cache) has approximately the same utilization.
-
- This is usually a good assumption for small caches and for caches
- which are physically indexed. The behavior of small caches can be
- modelled accurately based on the reference patterns of single
- programs, since a small cache generally cannot hold the entire context
- of even one program. Physically indexed caches benefit from the
- pseudo-random mapping function between virtual and physical addresses
- that tends to smear references across the cache.
-
- HP's machines typically have a large cache system, and we use
- virtually-indexed caches (as many RISC companies do, to decrease the
- latency on loads). With larger caches, one begins to see larger-scale
- patterns in memory reference, and utilization of the cache begins to
- become non-uniform. The cache develops "hot spots". Hashing the
- address tends to smear references across the cache. The resulting
- improvement in performance varies tremendously with application, OS
- policy and even system configuration (like what mix of applications
- are running at any given time). Something like SPEC benchmarks are
- affected very little, while large-scale transaction processing sees a
- substantial improvement. I can't give any specific examples, but
- suffice it to say that the difference between different hashes is
- sufficient to warrant a good deal of work in chosing the specifics of
- the hash.
-
-
- > 2) The argument for direct mapped caches is that they have a better
- > effective access time than set-associative caches because direct
- > mapped caches do not need multiplexers to select the data. Would
- > the cost of hashing eliminate this advantage? In other words:
- > would a direct mapped cache with hashing or a set-associative
- > cache (with associated multiplexers) give the best performance?
-
- There is an additional benefit of direct mapped caches over
- set-associative. Generally pipelines are optimized for cache hits,
- because there is usually a little time to do some backing up on cache
- misses while you're waiting for the data. So, with a direct-mapped
- cache, when the data pops out the processor runs with it and does not
- wait for the tag compare. The compare is done in parallel with
- subsequent processing. Removing the tag match from the basic load-use
- cycle adds a lot to performance. With a set-associative cache this
- can't be done unless you do some fairly costly functional unit
- replication. This problem can be gotten around by building "extra
- fast" tags, but this approach tends to limit your ability to scale up
- frequency, and as Mitch Alsup pointed out, this does not work with SRAMS.
-
- Additionally, the hashing is combined in circuits which "power-up" the
- cache index, adding only a fractional gate delay to the path.
-
- As to whether direct-mapped or set-associative is best, this depends a
- lot on many things (where have you heard that before? :-). Set
- associativity improves hit rate, but can increase latency (as I
- described). Additionally, large caches cannot be implemented on-chip
- (I don't think of 32KB as large :-), and off-chip set-associative
- caches are expensive (although HP builds those too, but only for our
- larger business-class PA machines).
-
-
- > 3) What hash function does HP use?
-
- Although the user-visible virtual address space in current PA-RISC
- products is 32-bit, PA machines actually implement a larger virtual
- address space (48 bits in the snakes workstations). This large global
- address space is used by the OS in many ways to improve performance.
- The cache index hash is an xor of bits from the lower 32 address bits
- with bits from the upper portion (which we call the "space"). The OS
- randomly assigns spaces to user processes, effectively smearing their
- references across the cache.
-
- The hash does cause good separation in the caches between:
- one user's code and data and another user's code and data
- user's code and data and OS code and data
- database data and I/O buffer data
-
-
- > I would be interestered to hear from anyone who knows the answer
- > to these questions from papers or first-hand knowledge.
-
- I hope this answers your questions.
-
- ------------------------------------------------------------------------------
- Dale Morris | Now is the time, and now is the record...
- morris@nsa.hp.com | of the time.
- ------------------------------------------------------------------------------
-