home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.arch
- Path: sparky!uunet!usc!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!princeton!moo!awolfe
- From: awolfe@moo.Princeton.EDU (Andrew Wolfe)
- Subject: Re: FP-number cache? Unclocked VLSI design.
- Message-ID: <1993Jan5.170446.15655@Princeton.EDU>
- Originator: news@nimaster
- Sender: news@Princeton.EDU (USENET News System)
- Nntp-Posting-Host: moo.princeton.edu
- Organization: Princeton University
- References: <1993Jan5.085415.19676@klaava.Helsinki.FI>
- Date: Tue, 5 Jan 1993 17:04:46 GMT
- Lines: 70
-
- In article <1993Jan5.085415.19676@klaava.Helsinki.FI>, veijalai@klaava.Helsinki.FI (Tony Veijalainen) writes:
- ...
- |>
- |> On the other hand FPU-units that appear more and more are in modern big
- |> CPU:s are quite far conceptually from other operations. I have
- |> suspision that FP-arithmetic tends to cluster quite heavily, and because
- |> of traditional efficiency thinking and fixed number arithmetic in
- |> business applications some big parts of programs are integer only (not
- |> much FP-operations in interupt code for example :-).
- |>
- |> So have somebody researched the havoc FPU-instructions make to general
- |> data cache? Is there possible advantage of having FP-number (with
- |> separate bus to FPU-register file) and fixnumber caches with advantages
- |> outdoing the cost on CPU (like diminiching the general cache size, is
- |> this over specialization?).
- |>
- |> --
- |> Tony Veijalainen e-Mail: Tony.Veijalainen@helsinki.fi (preferred)
- |> (finger veijalai@plootu.helsinki.fi for more information)
- |>
-
-
- My students and I performed this experiment last year. I thought that it
- was a great idea. In retrospect - maybe it wasn't.
-
- The idea was this:
- -------------------
-
- Since FP and Int data are essentially different classes of information and
- use different parts of the CPU - (Just like instructions and data) - should
- we provide two caches and two paths to cache? Would this double memory
- throughput in FP intensive code?
-
- We modified Mike Johnson's superscalar processor simulator to support two
- data caches - one for ints and one for FP. We also increased the internal
- L/S units and the L/S busses to support the extra BW required. If FP and
- integer data exist of the same cache line - a snooping mechanism would
- maintain coherency.
-
- We found that the speedup was no more than 2-3% higher than using a single
- cache of combined size. In non-FP codes, the larger single cache was better.
-
-
- Explanation:
- -----------
-
- We believe that the explanation is that during FP intensive operations, most
- programs use only a few addressing integers and keep these in registers.
- Therefore, while FP and Integer BW are both important - they are not usually
- used at the same time. Some programs with pointer-based data stuctures may
- respond differently. (We used SPEC programs).
-
- I still have hope for multiple caches - but not using FP/Int classifications.
-
-
- Want more info - Princeton Univ Tech. Rept. CE-A92-2 can be requested from
- me.
-
-
- (Philosophical note. This experiment also raised the issue of whether or
- not the architecture community will accept the publication of negative
- results. We could not locate any recent papers in major conferences that
- did not have positive results)
-
- --
- --------------------------------------
- Andrew Wolfe
- Assistant Professor
- Department of Electrical Engineering
- Princeton University
-