home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.arch:10449 comp.lang.misc:3521
- Newsgroups: comp.arch,comp.lang.misc
- Path: sparky!uunet!spool.mu.edu!uwm.edu!zaphod.mps.ohio-state.edu!magnus.acs.ohio-state.edu!sample.eng.ohio-state.edu!purdue!mentor.cc.purdue.edu!hrubin
- From: hrubin@mentor.cc.purdue.edu (Herman Rubin)
- Subject: Re: Hardware Support for Numeric Algorithms
- Message-ID: <BxAJMA.DKy@mentor.cc.purdue.edu>
- Keywords: n
- Organization: Purdue University Statistics Department
- References: <1992Oct29.153514.22927@yrloc.ipsa.reuter.COM> <1992Nov5.202412.7266@linus.mitre.org> <1992Nov6.075459.13348@coe.montana.edu>
- Date: Fri, 6 Nov 1992 10:54:09 GMT
- Lines: 57
-
- In article <1992Nov6.075459.13348@coe.montana.edu> osycs@giac1.oscs.montana.edu (Craig Spannring) writes:
-
- .............................
-
- >Large scale computing demands that you to avoid the unreadable
- >micro-optimizations that Herman is suggesting. If one of your little
- >attempts at optimizing the code ends up producing incorrect results
- >then you've just wasted months of research time. Meanwhile your
- >colleagues that have programs producing correct results are getting
- >published and getting grants. How much research and computing time are
- >you wasting with buggy programs?
-
- As I see it, every one of the "micro-optimizations" I used is something
- which is totally natural to someone who has not been brainwashed out of
- gotos. The code I posted originally was buggy do to an attempt to avoid
- gotos in situations for which I was aware of alternatives which would not
- slow down to code.
-
- >The second problem with these 'optimizations' is that they aren't.
- >Herman seems to be most concerned with saving a few clock cycles per
- >iteration. The only significant performance gains will be found be
- >using an algorithm that uses fewer iterations. Example- Even a highly
- >optimized bubble sort will run slower than an unoptimized quick sort
- >for the large data sets you are refering to.
-
- You are partially correct here. I can get an improvement in the code
- which produces different results, and uses the same average number of
- resources of each type, by changing the algorithm to start out
-
- start: g = ...;
- n += (g-1);
- g = ...;
- if (g==1) goto label1;
-
- and have each situation here have the g value 1 less. This now
- reduces the average number of uses of start: per output of n and m
- by a factor of 2 to 1.54. Other "micro-optimizations", which some
- of my correspondents found, can make speedups. But they use some
- code "duplication" (really duplication but with some initialization
- removed) and at least as many gotos. By using vectorization tricks,
- which instead of producing a fixed number of outputs produce a
- variable number, and by using vector insertions, that this type of
- program can be significantly speeded up.
-
- Additional performance improvements can be obtained by instead
- ending the higher g's off and using a fallthrough to label1,
- as 77% of the loop exits do this. This is the situation both
- in this modified code and the original. This can be done by
- exporting most of the instructions into another block; would
- your optimizing compilers find this? Certainly not without
- being given the 77% information above in some manner.
-
- --
- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
- Phone: (317)494-6054
- hrubin@snap.stat.purdue.edu (Internet, bitnet)
- {purdue,pur-ee}!snap.stat!hrubin(UUCP)
-