home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.os.os2.programmer:3705 comp.lang.fortran:2795
- Path: sparky!uunet!haven.umd.edu!darwin.sura.net!wupost!usc!cs.utexas.edu!rutgers!modus!gear!cadlab!martelli
- From: martelli@cadlab.sublink.org (Alex Martelli)
- Newsgroups: comp.os.os2.programmer,comp.lang.fortran
- Subject: Re: 32 Bit Fortran Compiler ??
- Message-ID: <1992Jul21.140038.27900@cadlab.sublink.org>
- Date: 21 Jul 92 14:00:38 GMT
- References: <hjkirch.710065189@morpheus> <ignacij.710085875@meishan.animal.uiuc.edu> <1992Jul06.073701.19620@cadlab.sublink.org> <ignacij.710603357@meishan.animal.uiuc.edu> <1992Jul15.155450.22340@cadlab.sublink.org> <ignacij.711385250@meishan.animal.uiuc.edu>
- Organization: CAD.LAB S.p.A., Bologna, Italia
- Lines: 176
-
- ignacij@meishan.animal.uiuc.edu (Ignacy Misztal) writes:
- ...
- :I agree that mixing types between the program and the subroutine should
- :be avoided. What happens, if it brings a 4 times improvement in speed?
- :Forget the standard if the code will run 1 week instead of 4, and it
- :runs on almost every compiler. The 4 time improvement example is below,
- :and it works with f77 compiler.
- :c this is a slow program with f77 compilers
- : real x(1000)
- : read(1)n,(x(i),i=1,n)
- : end
- :
- :c this is a fast program with f77 compilers
- : real x(1000)
- : call get(n,x)
- : end
- : subroutine(n,x)
- : character x*4000
- : read(1)n,x(1:4*n)
- : end
- :
- :I know that the inability of the f77 compilers to fold implicit loops
-
- I have tried to reproduce your results and failed utterly. First of
- all the runtimes of the above programs are of course in the noise.
- So I tried to amplify them a bit as follows: as p1.f I have:
- c this is an (allegedly) slow program with f77 compilers
- real x(1000)
- open(unit=1,file='pdat',form='unformatted')
- do 10,j=1,100
- rewind(1)
- read(1)n,(x(i),i=1,n)
- 10 continue
- end
-
- As p2.f I have:
- c this is an (allegedly) fast program with f77 compilers
- real x(1000)
- open(unit=1,file='pdat',form='unformatted')
- do 10,j=1,100
- rewind(1)
- call get(n,x)
- 10 continue
- end
- subroutine get(n,x)
- character x*4000
- read(1)n,x(1:4*n)
- end
-
- I compile each with "f77 -o p1 p1.f" and "f77 -o p2 p2.f" on the HP9000/400
- workstation I'm on at present.
- I build a datafile for them to read with the following program p3.f:
- c create file to try things out
- real x(1000)
- open(unit=1,file='pdat',form='unformatted')
- n=567
- do 10,i=1,n
- 10 x(i)=i/float(n)
- write(1)n,(x(i),i=1,n)
- end
-
- compiled of course with "f77 -o p3 p3.f". Then I perform:
- p3 ; p2 ; p1 # avoid caching effects
- for i in 1 2 3 4 5
- do
- time p1
- done >p1.tim 2>&1
- for i in 1 2 3 4 5
- do
- time p2
- done >p2.tim 2>&1
-
- and here is p1.tim:
-
- real 0m0.20s
- user 0m0.04s
- sys 0m0.14s
-
- real 0m0.18s
- user 0m0.02s
- sys 0m0.12s
-
- real 0m0.20s
- user 0m0.02s
- sys 0m0.16s
-
- real 0m0.18s
- user 0m0.02s
- sys 0m0.12s
-
- real 0m0.18s
- user 0m0.04s
- sys 0m0.14s
-
- and here is p2.tim:
-
- real 0m0.20s
- user 0m0.02s
- sys 0m0.16s
-
- real 0m0.20s
- user 0m0.02s
- sys 0m0.16s
-
- real 0m0.18s
- user 0m0.00s
- sys 0m0.16s
-
- real 0m0.18s
- user 0m0.06s
- sys 0m0.10s
-
- real 0m0.20s
- user 0m0.04s
- sys 0m0.16s
-
-
- I fail to notice ANY statistically significant difference between elapsed,
- usermode OR sysmode times of these two programs, MUCH LESS the alleged
- "FOUR TIMES" improvement!
-
- I *assume* if I took the time and trouble to run these on our Apollo's,
- VAXen, IBM 6150's, DOS boxes, RS6000's, SONY NeWS's, DECsystems, and so
- on ad nauseam, I WOULD find situations in which either of these programs
- would surprisingly come out much faster than the other one. ON THE OTHER
- HAND, I *KNOW FOR SURE* that I would find situations in which the second,
- standard-violating program would UNsurprisingly crash and burn, while the
- first, allegedly "slow on f77" one (which runs PERFECTLY FINE when
- compiled by f77 on this here 68030 box) WILL run on all of these boxes.
-
- :in the read statement is a weakness that will be corrected soon (in
- :a couple of years?). But now, there is no alternative. I know that
- :some people will suggest breaking the read statement into 2, etc., at a
- :considerable cost. However, my program has to run fast now.
-
- Try separating out the I/O; on most of the boxes I have around here,
- doing the I/O *IN A C LANGUAGE SUBROUTINE* called from your Fortran
- program (sigh!) will, depending on various factors, accelerate your
- I/O bound programs by 20%-80%. Sad, but true. This will certainly
- not hold everywhere, so you'll have to have at hand a Fortran routine
- to link instead of the C one for the "exceptional" (HA!) boxes where
- the Fortran runtime I/O library IS as well-tuned as it SHOULD be.
-
- :By the way, how can I equivalence a section of an integer vector
- :(position known at run time only), to a character string?
-
- You equivalence the *total* integer vector to a LONG character string,
- or to an array of longish character strings. There are of course no
- guarantees in the standard about how long you CAN make a character
- string, or an array of them, OR an array of integers for that matter
- (e.g. the infamous 64K-byte limits on old Fortran PC compilers), so
- you may run against one or more such limitations.
-
- Example:
- integer*4 iarra(1000)
- character chst*4000
- equivalence (iarra,chst)
- call iwantints(iarra(40),60)
- call iwantchar(chst(161:400))
- end
- subroutine iwantints(ints,n)
- integer*4 n,ints(n)
- c whatever
- end
- subroutine iwantchar(chars)
- character*(*) chars
- c whatever else
- end
-
- [SORRY for the NON-standard '*4' on 'integer', but after getting burned
- BADLY by old d*mned standard-VIOLATING 16-bit compilers where "integer"
- meant TWO bytes, I guess I'm psychologically scarred for life on this!-].
-
- --
- Email: martelli@cadlab.sublink.org Phone: ++39 (51) 6130360
- CAD.LAB s.p.a., v. Ronzani 7/29, Casalecchio, Italia Fax: ++39 (51) 6130294
-