home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.sys.dec:6719 comp.dsp:2935
- Path: sparky!uunet!spool.mu.edu!uwm.edu!rutgers!sgigate!odin!pipo.paris.sgi.com!jpp
- From: jpp@pipo.paris.sgi.com (Jean-Pierre Panziera - SGI PARIS)
- Newsgroups: comp.sys.dec,comp.dsp
- Subject: Re: Alpha fft performance
- Message-ID: <1993Jan7.121038.4845@odin.corp.sgi.com>
- Date: 7 Jan 93 12:10:38 GMT
- References: <1992Dec31.164221.27734@aplcen.apl.jhu.edu> <1993Jan4.154245.13258@crl.dec.com>
- Sender: news@odin.corp.sgi.com (Net News)
- Reply-To: jpp@sgi.com
- Organization: Silicon Graphics, Parallel Team
- Lines: 56
- Nntp-Posting-Host: pipo.paris.sgi.com
-
- In article <1993Jan4.154245.13258@crl.dec.com>, payne@crl.dec.com
- (Andrew Payne) writes:
- ......
- |>
- |> For a 1024 point, complex, single precision FFT (i.e. just fits in
- the on-chip
- |> 8K D cache), we have a measured time of 96,000 cycles. For the
- various Alpha
- |> systems, this translates to:
- |>
- |> 133 MHz clock 722 microseconds
- |> 150 Mhz 640 microseconds
- |> 200 MHz 480 microseconds
- |>
- |> The algorithm is a radix-4 algorithm, and is basically just a C translation
- |> of the FORTRAN code in "DFT/FFT and Convolution Algorithms" by Burrus and
- |> Parks (with a few tweaks, of course). It was compiled with GCC 2.3 and
- |> the execution time was measured with the Alpha's process cycle counter.
- |>
- ......
- |> --
- |> Andrew C. Payne
- |> DEC Cambridge Research Lab
-
- I'd like to ask a few questions:
-
- |> For a 1024 point, complex, single precision FFT (i.e. just fits in
- the on-chip
- |> 8K D cache),
-
- An array of 1024 complex numbers indeed uses 8 Kbytes. However to compute an
- FFT you need an extra array of Sines and Cosines of same size (8 kBytes).
- The total space required for this FFT is then at least 8+8 = 16 kBytes.
- So the assumption "just fits in the on-chip 8K D cache" seems abusive. ???
-
-
- |> the execution time was measured with the Alpha's process cycle counter.
-
- I am not familiar with "Alpha's process cycle counter". Is this a simulator ?
- Does this tool take in account eventual cache misses ?
- How do real benchmark compare with your simulation ?
-
- |> The algorithm is a radix-4 algorithm, and is basically just a C translation
- |> of the FORTRAN code in "DFT/FFT and Convolution Algorithms" by Burrus and
- |> Parks (with a few tweaks, of course).
-
- Are the results of your transform ordered, or are they "bit reversed" ?
-
- Thanks you in advance.
-
- ---
- ___ ___/ ___ / ___ / Jean-Pierre Panziera
- / / / / / jpp@paris.sgi.com
- / ______/ ______/
- / / /
- _____/ ___/ ___/
-