home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.sys.super:872 comp.lang.fortran:2894
- Path: sparky!uunet!sun-barr!olivea!decwrl!pa.dec.com!e2big.mko.dec.com!quark.enet.dec.com!lionel
- From: lionel@quark.enet.dec.com (Steve Lionel)
- Newsgroups: comp.sys.super,comp.lang.fortran
- Subject: Re: Inner product / AXPY performance
- Message-ID: <1992Jul30.204909.24230@e2big.mko.dec.com>
- Date: 30 Jul 92 23:36:50 GMT
- References: <l7gi3fINNmsd@utkcs2.cs.utk.edu>
- Sender: guest@e2big.mko.dec.com (Guest (DECnet))
- Organization: Digital Equipment Corporation
- Lines: 34
-
-
- In article <l7gi3fINNmsd@utkcs2.cs.utk.edu>, eijkhout@cupid.cs.utk.edu
- (Victor Eijkhout) writes...
- >
- >I would like to get an idea of the difference in performance
- >between inner products
- >
- > do i=1,n x = x + a(i)*b(i)
- >
- >and axpy operations
- >
- > do i=1,n x(i) = x(i) + a*b(i)
- >
- >which both have the same number of operations, but the inner product
- >has an accumulation, which traditionally seems to be an
- >unvectorizable idea.
- >
-
- I tried this with VAX FORTRAN-HPO; as long as one uses the
- /ASSUME=NOACCURACY_SENSITIVE qualifier so that the dot product's reduction
- transformation can be performed (the default is ACCURACY_SENSITIVE which
- disables transformations that could yield different results than scalar
- execution), both forms vectorize very nicely. (The dot product form,
- of course, has a final reduction step that the "axpy" form doesn't need.)
- The actual vector mul-add sequences are essentially the same between the two.
-
- Of course, one can also use the BLAS SDOT and SAXPY intrinsics, which
- VAX FORTRAN-HPO will expand and vectorize (and parallelize, if you like.)
-
- Steve Lionel lionel@quark.enet.dec.com
- SDT Languages Group
- Digital Equipment Corporation
- 110 Spit Brook Road
- Nashua, NH 03062
-