NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / arch / 8974 < prev next >

Wrap

Internet Message Format | 1992-08-20 | 5.8 KB

Path: sparky!uunet!dtix!darwin.sura.net!mips!sdd.hp.com!hpscdc!hplextra!hpcc05!aspen!morris From: morris@aspen.NSA.HP.COM (Dale Morris) Newsgroups: comp.arch Subject: Re: Caches and Hashing Message-ID: <1360024@aspen.NSA.HP.COM> Date: 19 Aug 92 17:33:47 GMT References: <Bt5DKp.z0@netnews.jhuapl.edu> Organization: HP Networked Systems Architecture - Cupertino, CA Lines: 109 John Hayes writes: > There was an interesting idea in a recent article in Microprocessor > Report. The article was describing HP's PA-RISC. Here is what caught > my eye: > "The caches are direct mapped to eliminate the performance > impact of multiplexers that would be required with set- > associativity. To reduce the cache miss rates, the addresses > are hashed before they drive the cache SRAMS."* > > This raises several questions in my mind: > 1) How much does hashing improve the miss rate of a direct-mapped > cache? Let me describe the rationale behind this. The approaches people take to improving system performance can often be deeply intertwingled, so pardon me if some of this might appear "obvious". Design context is important in evaluating something like this. One of the working assumptions of caches is that, although there is temporal and spacial locality in the stream of memory references, on the whole, these references tend to cover the cache's index space somewhat uniformly. This means that for a cache that is not fully associative, each of the sets (or each of the lines in a direct-mapped cache) has approximately the same utilization. This is usually a good assumption for small caches and for caches which are physically indexed. The behavior of small caches can be modelled accurately based on the reference patterns of single programs, since a small cache generally cannot hold the entire context of even one program. Physically indexed caches benefit from the pseudo-random mapping function between virtual and physical addresses that tends to smear references across the cache. HP's machines typically have a large cache system, and we use virtually-indexed caches (as many RISC companies do, to decrease the latency on loads). With larger caches, one begins to see larger-scale patterns in memory reference, and utilization of the cache begins to become non-uniform. The cache develops "hot spots". Hashing the address tends to smear references across the cache. The resulting improvement in performance varies tremendously with application, OS policy and even system configuration (like what mix of applications are running at any given time). Something like SPEC benchmarks are affected very little, while large-scale transaction processing sees a substantial improvement. I can't give any specific examples, but suffice it to say that the difference between different hashes is sufficient to warrant a good deal of work in chosing the specifics of the hash. > 2) The argument for direct mapped caches is that they have a better > effective access time than set-associative caches because direct > mapped caches do not need multiplexers to select the data. Would > the cost of hashing eliminate this advantage? In other words: > would a direct mapped cache with hashing or a set-associative > cache (with associated multiplexers) give the best performance? There is an additional benefit of direct mapped caches over set-associative. Generally pipelines are optimized for cache hits, because there is usually a little time to do some backing up on cache misses while you're waiting for the data. So, with a direct-mapped cache, when the data pops out the processor runs with it and does not wait for the tag compare. The compare is done in parallel with subsequent processing. Removing the tag match from the basic load-use cycle adds a lot to performance. With a set-associative cache this can't be done unless you do some fairly costly functional unit replication. This problem can be gotten around by building "extra fast" tags, but this approach tends to limit your ability to scale up frequency, and as Mitch Alsup pointed out, this does not work with SRAMS. Additionally, the hashing is combined in circuits which "power-up" the cache index, adding only a fractional gate delay to the path. As to whether direct-mapped or set-associative is best, this depends a lot on many things (where have you heard that before? :-). Set associativity improves hit rate, but can increase latency (as I described). Additionally, large caches cannot be implemented on-chip (I don't think of 32KB as large :-), and off-chip set-associative caches are expensive (although HP builds those too, but only for our larger business-class PA machines). > 3) What hash function does HP use? Although the user-visible virtual address space in current PA-RISC products is 32-bit, PA machines actually implement a larger virtual address space (48 bits in the snakes workstations). This large global address space is used by the OS in many ways to improve performance. The cache index hash is an xor of bits from the lower 32 address bits with bits from the upper portion (which we call the "space"). The OS randomly assigns spaces to user processes, effectively smearing their references across the cache. The hash does cause good separation in the caches between: one user's code and data and another user's code and data user's code and data and OS code and data database data and I/O buffer data > I would be interestered to hear from anyone who knows the answer > to these questions from papers or first-hand knowledge. I hope this answers your questions. ------------------------------------------------------------------------------ Dale Morris | Now is the time, and now is the record... morris@nsa.hp.com | of the time. ------------------------------------------------------------------------------