NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / lang / c / 16595 < prev next >

Wrap

Text File | 1992-11-16 | 5.5 KB | 123 lines

Newsgroups: comp.lang.c Path: sparky!uunet!utcsri!helios.physics.utoronto.ca!alchemy.chem.utoronto.ca!mroussel From: mroussel@alchemy.chem.utoronto.ca (Marc Roussel) Subject: Allowed Fortran and C optimizations Message-ID: <1992Nov16.153307.4632@alchemy.chem.utoronto.ca> Sender: mroussel@alchemy.chem.utoronto.ca (Marc Roussel) Organization: Department of Chemistry, University of Toronto Date: Mon, 16 Nov 1992 15:33:07 GMT Lines: 112 A few days ago, I suggested that I might post some example code showing that better performance can sometimes be obtained from Fortran than C unless one is willing to hand-tune the C code. Before I show my example, I would like to point out that I am not claiming that Fortran is better than C for everything, only that Fortran sometimes optimizes better than equivalent C code. This example is from an HP Interworks conference (New Orleans, August 24-27 1992) talk by Bob Montgomery of HP. All the numbers reported below will be for the HP 720. The benchmark is a routine called vec_mult_add which multiplies a vector v by a scalar a and adds a vector u, returning vector r. There is one small restriction: The routine must be Fortran-callable, i.e. everything has to be passed by reference. Here is the Fortran code: subroutine vec_mult_add(u,v,cnt,a,r) integer cnt real u(cnt),v(cnt),r(cnt),a * Comment: This was obviously written by someone who is used to * very old Fortran compilers. The following "if" is redundant * with the loop indices. I suspect that the optimizer will know * that however. if (cnt.le.0) return do ii=1,cnt r(ii) = a*v(ii) + u(ii) end do end On an HP 720, compiled with the -O flag (which does not invoke the HP preprocessor but allows all normal optimizations), this snippet, driven by an appropriate wrapper, achieves 18.50 MFLOPS. Remember that number. The most obvious way to implement this routine in C (remember, it as to be Fortran-callable) is probably vec_mult_add(u,v,cnt,a,r) float *u,*v; int *cnt; float *a,*r; { int ii; if (*cnt <= 0) return; for (ii=0; ii<*cnt; ii++) { *r++ = (*a * *v++) + *u++; } } This version of vec_mult_add gets you either 4.73 MFLOPS or 7.09 MFLOPS depending on whether or not you compiled in ANSI mode. (ANSI mode is faster because the loop then does not contain implicit float-double conversions.) That's pretty sad. Now if you know a little bit about C compilers and optimizers (and I know only a very little) it will be pretty clear what parts of this routine are inhibiting the optimizer. To put it simply, the fact that everything is a pointer is forcing the optimizer to make extremely conservative assumptions. Let's make the following modifications: First, we'll make local copies of cnt and a so that the compiler doesn't have to worry about aliasing. Then, we'll use array notation for things which really are arrays: vec_mult_add(u,v,cnt,a,r) float u[],v[]; int *cnt; float *a,r[]; { int ii; int lcnt = *cnt; float la = *a; if (lcnt <= 0) return; for (ii=0; ii<lcnt; ii++) { r[ii]+ = la * v[ii] + u[ii]; } } Compiling this in ANSI mode with the -O flag we get 9.93 MFLOPS, i.e. about half the Fortran performance. Interestingly, we get 9.93 MFLOPS without switching to array syntax, i.e. with just the la and lcnt hacks. HP's C compiler has another optimizer switch which asserts that none of the arguments of a function call are aliased, the +Om1 flag. It's only when we make all the modifications shown above AND turn on this flag that we get 18.50 MFLOPS. Now the +Om1 flag does what I want it to, but I wouldn't want to compile a library or program written in C by someone else with it: Since C does not force you to keep your function parameters distinct, I couldn't count on the program executing correctly. That means that in many scientific codes whose guts are made up mostly of simple loops like the one shown above, I can expect C to perform about half as well as Fortran. In my case, that's not a big deal. For other people whose codes run for days, it is. Note that you can push this argument to the absurd: HP's hand-coded assembler version of vec_mult_add cranks out over 28 MFLOPS on a 720. Assembler is not a good choice for most scientific programming because it takes too much programmer time and effort. In the case of C and Fortran however, since both languages are roughly equally easy to learn and code, no such argument can be made. Furthermore, if I'm allowed to use special compiler flags, the HP's Fortran preprocessor (invoked with the +OP flag) gives this language an unfair edge since it should be able to recognize the loop as a vec_mult_add and call the assembler routine thus producing 28 MFLOPS performance in the Fortran program. It would of course be completely unfair to compare the two languages on this basis. What it comes down to is this. If you write Fortran programs, you are so constrained that the compiler knows exactly what you mean, right off the bat. If you write C, you have to have ways to tell the compiler what you mean. Expressiveness, in this case, comes at a performance price. Whether C or Fortran is the better language for your number crunching application will depend on whether or not you need C's features. A lot of scientific work requires no data structures more sophisticated than static arrays. In those cases, Fortran is probably the better choice. I wouldn't want to create binary trees in Fortran anymore however. Marc R. Roussel mroussel@alchemy.chem.utoronto.ca