NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / arch / 10449 < prev next >

Wrap

Internet Message Format | 1992-11-07 | 3.3 KB

Xref: sparky comp.arch:10449 comp.lang.misc:3521 Newsgroups: comp.arch,comp.lang.misc Path: sparky!uunet!spool.mu.edu!uwm.edu!zaphod.mps.ohio-state.edu!magnus.acs.ohio-state.edu!sample.eng.ohio-state.edu!purdue!mentor.cc.purdue.edu!hrubin From: hrubin@mentor.cc.purdue.edu (Herman Rubin) Subject: Re: Hardware Support for Numeric Algorithms Message-ID: <BxAJMA.DKy@mentor.cc.purdue.edu> Keywords: n Organization: Purdue University Statistics Department References: <1992Oct29.153514.22927@yrloc.ipsa.reuter.COM> <1992Nov5.202412.7266@linus.mitre.org> <1992Nov6.075459.13348@coe.montana.edu> Date: Fri, 6 Nov 1992 10:54:09 GMT Lines: 57 In article <1992Nov6.075459.13348@coe.montana.edu> osycs@giac1.oscs.montana.edu (Craig Spannring) writes: ............................. >Large scale computing demands that you to avoid the unreadable >micro-optimizations that Herman is suggesting. If one of your little >attempts at optimizing the code ends up producing incorrect results >then you've just wasted months of research time. Meanwhile your >colleagues that have programs producing correct results are getting >published and getting grants. How much research and computing time are >you wasting with buggy programs? As I see it, every one of the "micro-optimizations" I used is something which is totally natural to someone who has not been brainwashed out of gotos. The code I posted originally was buggy do to an attempt to avoid gotos in situations for which I was aware of alternatives which would not slow down to code. >The second problem with these 'optimizations' is that they aren't. >Herman seems to be most concerned with saving a few clock cycles per >iteration. The only significant performance gains will be found be >using an algorithm that uses fewer iterations. Example- Even a highly >optimized bubble sort will run slower than an unoptimized quick sort >for the large data sets you are refering to. You are partially correct here. I can get an improvement in the code which produces different results, and uses the same average number of resources of each type, by changing the algorithm to start out start: g = ...; n += (g-1); g = ...; if (g==1) goto label1; and have each situation here have the g value 1 less. This now reduces the average number of uses of start: per output of n and m by a factor of 2 to 1.54. Other "micro-optimizations", which some of my correspondents found, can make speedups. But they use some code "duplication" (really duplication but with some initialization removed) and at least as many gotos. By using vectorization tricks, which instead of producing a fixed number of outputs produce a variable number, and by using vector insertions, that this type of program can be significantly speeded up. Additional performance improvements can be obtained by instead ending the higher g's off and using a fallthrough to label1, as 77% of the loop exits do this. This is the situation both in this modified code and the original. This can be done by exporting most of the instructions into another block; would your optimizing compilers find this? Certainly not without being given the 77% information above in some manner. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@snap.stat.purdue.edu (Internet, bitnet) {purdue,pur-ee}!snap.stat!hrubin(UUCP)