NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / lang / c / 16707 < prev next >

Wrap

Internet Message Format | 1992-11-17 | 7.2 KB

Path: sparky!uunet!ontek!mikey From: mikey@ontek.com (euphausia superba) Newsgroups: comp.lang.c Subject: Re: qucik, merge, shell and bubble sort Message-ID: <2188@ontek.com> Date: 17 Nov 92 22:33:48 GMT References: <1e1cnvINNt2t@usenet.INS.CWRU.Edu> Organization: Ontek Corporation -- Laguna Hills, California Lines: 159 In comp.lang.c, cc015@cleveland.Freenet.Edu (Vito T. Kwan) writes: | | I want to take a look at these four sorting method in C, does anyone | know where can I get (ftp) them? I have implementations of quick, merge, and bubble sort; send email to me if you can't find them through other channels. Many people have posted timings for sort routines over the years; I was dissatisfied with the level of detail, so I grabbed a few sort routines and did some hacking and measured some of the not-so-obvious aspects of how sort algorithms perform under extreme conditions. This is scientific only in the sense that the numbers I got were repeatable to one or two significant digits. The columns in the following tables have these meanings: compares: Number of callbacks to the comparison function. On the few occasions I've gprof'd a sort operation, most of the time is spent in the comparison function, so I consider this the most interesting column. stack: Amount of stack space used (roughly). Obviously, the way I measured this was non-portable, heap: Amount of heap space used as reported by mallinfo.uordblks user: User time as reported by times(tms.tms_utime) This is supposed to be accurate to 1/60th of a second. system: System time as reported by times(tms.tms_stime) This is also supposed to be accurate to 1/60th of a second. The rows describe the input array and the behavior of the comparison function. random: The input data is generated by rand(), and the comparison function reports truthfully. ascend: The data being sorted is already in the correct order. This is a useful case for sorting indexes which are pretty much already in order; bubble sort handles this case well, but results of bubble sort are excluded here, mostly because bubble sort is silly. descend: The input data is exactly in reverse order. fib asc: The data is random, but the comparison function always says that its first argument is less than its second argument. fib asc: The data is random, but the comparison function always says that its first argument is greater than its second argument. suprise: The comparison function returns a random value. Now, some of these are really obscure situations that just *shouldn't* happen with real data and well-written comparison functions. But I've found that some of the most time-consuming sorts are done at the request of the user, and are ones in which the user can specify several possibly interrelated keys. I worried about what would happen if the user managed to find a combination that would result in something other than a complete ordering. My comments are interspersed with the results. banzai# test_sort 5000 HZ = 60 element size = 8 number of elements = 5000 *** qsort *** data compares stack heap user system random: 66554 768 0 3.68 0.00 ascend: 57727 960 0 3.00 0.00 descend: 57728 960 0 3.03 0.00 fib asc: 57727 960 0 2.78 0.05 fib desc: ^C surprise: 61797 864 0 3.75 0.00 [This qsort is the one supplied with SunOS 4.1. It performs well under most circumstances but goes into an infinite loop when the comparison always returns 1. The infinite loop seems to be of such a character that the comparison function is being called with ever decreasing addresses and so presumably could cause segmentation faults under some circumstances. It's usage of stack space and heap space are perfectly fine for modern CPUs.] *** merge_sort *** data compares stack heap user system random: 55240 1152 40004 2.98 0.02 ascend: 32004 1152 40004 1.92 0.03 descend: 29804 1152 40004 1.72 0.03 fib asc: 32004 1152 40004 1.73 0.00 fib desc: 29804 1152 40004 1.63 0.00 surprise: 44434 1152 40004 2.75 0.00 [Based on other tests, the stack space is O(log(n)). The heap space used here is questionable; a more talented programmer might be able implement it without using up O(n) space. A less talented programmer would be tempted to place it on the stack, but that's a marginal improvement. In terms of number of comparisons, merge_sort's worst case is better than qsort's best case and with one exception the same can be said of execution speed.] *** merge_sort(maybe_sorted) *** data compares stack heap user system random: 56248 1152 40004 3.05 0.10 ascend: 7831 1152 40004 0.48 0.03 descend: 30827 1152 40004 1.77 0.05 fib asc: 33027 1152 40004 1.87 0.02 fib desc: 6023 1152 40004 0.37 0.00 surprise: 34723 1152 40004 2.22 0.00 [I modified merge_sort slightly to perform some simple checking to see if the array (or subsections of the array) are already in order. It adds about 2% to the number of comparisons in the "normal" case, but for sorting an index which is "almost" sorted already this cuts the time down dramatically.] *** quick_sort *** data compares stack heap user system random: 71409 1536 12 4.82 0.02 ascend: 51822 960 12 2.88 0.00 descend: 54924 1248 12 3.80 0.00 fib asc: 12497500 479808 12 601.07 1.42 fib desc: 12497500 0 12 597.93 1.15 surprise: 57198 672 12 3.97 0.00 [The quick sort is adapted from the one found on page 87 of K&R 2; they disclaim it as sub-optimal, for good reason it seems. I haven't had the time to rewrite it though I did fix the infinite loop problem, at a marginal cost in speed. The quadratic behavior in the case of a deceptive comparison routine should probably not be held against it.] -- The actual results are not true for all time and all machines. The execution time for these algorithms when using non-brain-damaged comparison routine are all within a factor of 2. Sorting a bigger array brings out the differences more clearly, but at least between qsort and merge sort the improvement is rather minor. I planned to add a feature for changing the size of the input array automatically and analyzing the change in time and space usage to directly calculate O(whatever) for each algorithm, but I'm a bit busy at the moment. (Posting to usenet to be precise.) I'll also be adding heapsort and shellsort algorithms; if anyone has a qsort-like drop-in replacement for either or both of those I'd appreciate a copy. der krill