home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.super
- Path: sparky!uunet!pgroup!lfm
- From: lfm@pgroup.com (Larry Meadows)
- Subject: i860 performance again (was:Re: World's Most Powerful...)
- Message-ID: <C1C81H.A37@pgroup.com>
- Date: Sun, 24 Jan 1993 02:53:41 GMT
- Distribution: inet
- References: <1993Jan21.165159.10149@meiko.com> <1993Jan22.015827.26653@nas.nasa.gov> <EIJKHOUT.93Jan23164314@cupid.cs.utk.edu>
- Organization: The Portland Group, Portland, OR
- Lines: 28
-
- In article <EIJKHOUT.93Jan23164314@cupid.cs.utk.edu> eijkhout@cupid.cs.utk.edu (Victor Eijkhout) writes:
- >
- >Assembly coded benchmarks, maybe. But the BLAS routines (Basic Linear
- >Algebra Subroutines) exist on many machines in assembler, and
- >precisely because they are 1/ standardized across all platforms
- >2/ optimized for each, you can write real programs in Fortran or C,
- >with a reasonable speed if you need the BLAS often enough.
-
- This is a good idea, but as I pointed out, you will not get more than
- 11 or 12 mflops on daxpy if you stick to Blas-1 (unless you have some
- kind of special daxpy [ maybe KAI? ] that assumes that vector 1 stays
- in cache and vector 2 comes from main memory, or vice versa).
-
- >
- >Just interface your program to assembler *kernels*.
- >
- >I know of people who write distributed memory linear system
- >solvers, that get over 50% performance out of *500 i860's*
- >using those assembler BLAS.
-
- These people must be using the Delta, huh? I don't know anyone else
- that has 500 i860s. Are they calling blas-2
- routines? If so, I'd definitely believe you, as I know people who get
- 90%+ performance on 80 or so shared memory i860-XRs using similar
- techniques.
- --
- Larry Meadows The Portland Group
- lfm@pgroup.com
-