NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / arch / 12085 < prev next >

Wrap

Text File | 1993-01-05 | 3.3 KB | 84 lines

Newsgroups: comp.arch Path: sparky!uunet!usc!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!princeton!moo!awolfe From: awolfe@moo.Princeton.EDU (Andrew Wolfe) Subject: Re: FP-number cache? Unclocked VLSI design. Message-ID: <1993Jan5.170446.15655@Princeton.EDU> Originator: news@nimaster Sender: news@Princeton.EDU (USENET News System) Nntp-Posting-Host: moo.princeton.edu Organization: Princeton University References: <1993Jan5.085415.19676@klaava.Helsinki.FI> Date: Tue, 5 Jan 1993 17:04:46 GMT Lines: 70 In article <1993Jan5.085415.19676@klaava.Helsinki.FI>, veijalai@klaava.Helsinki.FI (Tony Veijalainen) writes: ... |> |> On the other hand FPU-units that appear more and more are in modern big |> CPU:s are quite far conceptually from other operations. I have |> suspision that FP-arithmetic tends to cluster quite heavily, and because |> of traditional efficiency thinking and fixed number arithmetic in |> business applications some big parts of programs are integer only (not |> much FP-operations in interupt code for example :-). |> |> So have somebody researched the havoc FPU-instructions make to general |> data cache? Is there possible advantage of having FP-number (with |> separate bus to FPU-register file) and fixnumber caches with advantages |> outdoing the cost on CPU (like diminiching the general cache size, is |> this over specialization?). |> |> -- |> Tony Veijalainen e-Mail: Tony.Veijalainen@helsinki.fi (preferred) |> (finger veijalai@plootu.helsinki.fi for more information) |> My students and I performed this experiment last year. I thought that it was a great idea. In retrospect - maybe it wasn't. The idea was this: ------------------- Since FP and Int data are essentially different classes of information and use different parts of the CPU - (Just like instructions and data) - should we provide two caches and two paths to cache? Would this double memory throughput in FP intensive code? We modified Mike Johnson's superscalar processor simulator to support two data caches - one for ints and one for FP. We also increased the internal L/S units and the L/S busses to support the extra BW required. If FP and integer data exist of the same cache line - a snooping mechanism would maintain coherency. We found that the speedup was no more than 2-3% higher than using a single cache of combined size. In non-FP codes, the larger single cache was better. Explanation: ----------- We believe that the explanation is that during FP intensive operations, most programs use only a few addressing integers and keep these in registers. Therefore, while FP and Integer BW are both important - they are not usually used at the same time. Some programs with pointer-based data stuctures may respond differently. (We used SPEC programs). I still have hope for multiple caches - but not using FP/Int classifications. Want more info - Princeton Univ Tech. Rept. CE-A92-2 can be requested from me. (Philosophical note. This experiment also raised the issue of whether or not the architecture community will accept the publication of negative results. We could not locate any recent papers in major conferences that did not have positive results) -- -------------------------------------- Andrew Wolfe Assistant Professor Department of Electrical Engineering Princeton University