home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!idacrd!desj@ccr-p.ida.org
- From: desj@ccr-p.ida.org (David desJardins)
- Newsgroups: comp.arch
- Subject: Re: Hardware Support for Numeric Algorithms
- Message-ID: <1739@idacrd.UUCP>
- Date: 7 Nov 92 06:59:09 GMT
- References: <1992Nov4.183718.5242@newshost.lanl.gov> <2230@titccy.cc.titech.ac.jp> <1992Nov6.181826.29015@newshost.lanl.gov>
- Sender: news@idacrd.UUCP
- Followup-To: comp.programming
- Organization: IDA Center for Communications Research, Princeton
- Lines: 23
-
- J. Giles <jlg@cochiti.lanl.gov> writes:
- >>> for(i=0;i<N;i+=4)
- >>> { b0=b[i]; b1=b[i+1]; b2=b[i+2]; b3=b[i+3];
- >>> c0=c[i]; c1=c[i+1]; c2=c[i+2]; c3=c[i+3];
- >>> a0=a[i]; a1=a[i+1]; a2=a[i+2]; a3=a[i+3];
- >>> a[i]=a0+b0*c0; ...
-
- > And the code still does not parallelize. Why? An ANSI C
- > implementation must *infer* which things can be parallel and which
- > can't because the C language doesn't have explicit vector constructs.
- > There is *NO* *HAND* optimization of the above code which will remendy
- > this - without using some non-standard extension for explicitly
- > declaring whether the variables are aliased or not.
-
- You should read before you reply. The point of the above transformation
- is that there is no aliasing problem, and the compiler can know without
- any extensions that parallel execution is valid. Executing the four
- multiply-adds in parallel is valid regardless of any overlap between a,
- b, and c.
-
- Followups to someplace that isn't comp.arch.
-
- David desJardins
-