home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!uwm.edu!spool.mu.edu!snorkelwacker.mit.edu!ai-lab!life.ai.mit.edu!tmb
- From: tmb@arolla.idiap.ch (Thomas M. Breuel)
- Newsgroups: comp.arch
- Subject: Re: Hardware Support for Numeric Algorithms
- Followup-To: comp.arch
- Date: 7 Nov 92 09:01:14
- Organization: IDIAP (Institut Dalle Molle d'Intelligence Artificielle
- Perceptive)
- Lines: 101
- Message-ID: <TMB.92Nov7090114@arolla.idiap.ch>
- References: <BwIwEB.J1A@mentor.cc.purdue.edu> <BwJ1rB.pz@rice.edu>
- <1992Oct22.164414.12708@newshost.lanl.gov> <Bx78zu.395@rice.edu>
- <1992Nov4.183718.5242@newshost.lanl.gov> <2230@titccy.cc.titech.ac.jp>
- Reply-To: tmb@idiap.ch
- NNTP-Posting-Host: arolla.idiap.ch
- In-reply-to: mohta@necom830.cc.titech.ac.jp's message of 6 Nov 92 12:31:52 GMT
-
- mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes
- >>In article <Bx78zu.395@rice.edu>, preston@dawn.cs.rice.edu (Preston Briggs) writes:
- >>Using C as an assembler is doable, in that you can express (almost)
- >>every optimization in the source code. The result is machine dependent,
- >>ugly, unmaintainable, and I don't recommend it to anyone, but doesn't
- >>really require any additional optimization -- very similar to assembly
- >>language programming.
- >
- >Wrong.
- >
- >>The rationale document for ANSI C explicitly
- >>recognizes the optimization penalty inherent in pointers and suggests
- >>that remedies to this be a priority in future versions of the standard.
- >>Yes, if avoid procedure calls (at least, all those whose arguments
- >>are pointers),
- >
- >Automatic optimization by compilers have little to do with the above
- >mentioned *HAND* optimization.
- >
- >You can transform the following program
- >
- > f(a,b,c)
- > double *a,*b,*c;
- >
- > for(i=0;i<4*N;i++)
- > { a[i]+=b[i]*c[i];
- > ...
- >
- >to
- >
- > for(i=0;i<N;i+=4)
- > { b0=b[i]; b1=b[i+1]; b2=b[i+2]; b3=b[i+3];
- > c0=c[i]; c1=c[i+1]; c2=c[i+2]; c3=c[i+3];
- > a0=a[i]; a1=a[i+1]; a2=a[i+2]; a3=a[i+3];
- > a[i]=a0+b0*c0; ...
- >
- >by hand, knowing that the area for a, b and c does not overlap.
-
- Thank you: you are providing an excellent example of why
- Preston is right:
-
- (1) The optimization you just performed is machine specific: the optimal
- amount of loop unrolling depends on the precise characteristics of
- your hardware and may even differ from workstation to workstation of
- the same type.
-
- (2) You just introduced two serious bugs.
-
- (3) Finally, you cheated. Usually, limits don't work out that nicely
- (even if you get them right) so that you need special case code (e.g.,
- Duff's device) to handle the special cases.
-
- There are a few cases where hand optimizations like these are
- justified in C (e.g., the X11 blitter code). But don't fool yourself
- and others into believing that they are "easy" or "portable".
-
- Some problems with C pointers are fixable with a "noalias"
- declaration, so the situation may get slightly better. But,
- ultimately, C is a poorly designed language for truly aggressive
- optimization and high-performance computation, even if you are willing
- to do a lot of work by hand.
-
- Thomas.
-
- PS: I still use C/C++ for much of my work, because usually
- pragmatic issues like interfacing with existing libraries are
- more important to me than getting the best optimizations.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-