Usenet 1994 January

home *** CD-ROM | disk | FTP | other *** search

/ Usenet 1994 January / usenetsourcesnewsgroupsinfomagicjanuary1994.iso / sources / unix / volume25 / pty4 / part08 / radixsort.c < prev

Wrap

C/C++ Source or Header | 1992-02-18 | 19.6 KB | 546 lines

/* radixsort.c, radixsort.h: linear-per-byte in-memory string sort Daniel J. Bernstein, brnstnd@nyu.edu; Keith Bostic, bostic@ucbvax.berkeley.edu. No dependencies. Requires malloc, free, bcopy, bzero. Can use -DUCHAR_MAX. 10/5/91 DJB: Made c static, added ctr. 7/23/91 DJB: Baseline. radixsort/DJB 3.0. See bottom for BSD copyright. No known patent problems. Documentation in radixsort.3, portions based on BSD documentation. History: I discovered adaptive distribution sort in 1987. (I haven't published a paper on it---it's too trivial for that---though I did mention it in a letter to Knuth.) It grew out of a suggestion quoted in Knuth's section on quicksort et al., volume 3, page 128: ``John McCarthy has suggested setting K \approx (u + v)/2, if all keys are known to lie between u and v.'' What's the biggest problem with MSD radix sort? You can waste lots of time considering ranges of keys that don't even exist! In distribution sort, however, you usually start by finding the minimum and maximum keys. Here McCarthy's suggestion pops in. To sort a pile of floating-point numbers, find their minimum and maximum, divide the interval evenly into n subintervals, and split the numbers into piles by interval just as in MSD radix sort. Just as in quicksort, stop splitting when a pile can be sorted efficiently by a small-scale method. That's adaptive distribution sort, and it turns out to be hellishly fast for sorting floating-point data. The credit really belongs with McCarthy---I only generalized from 2 to n. Adaptive distribution sort can be applied to strings, too: find the first character where two strings in the pile differ, and distribute on that character. There's no fine line between adaptive distribution sort and MSD radix sort, but in any case you get a big speed boost from sorting small piles by a small-scale method. See especially Knuth, exercise 5.2.5-10. Computer scientists should note that this method is linear in the number of bytes being sorted. Sometime in 1989, as I recall, I saw a notice that someone had discovered an o(n log n) method of sorting n integers. The method depended on all sorts of weid bit manipulations and was utterly impractical. As the integers had to be short anyway, MSD radix sort worked in O(n) time. My guess is that most computer scientists don't learn about MSD radix sort (and hence don't know that sorting is linear-per-byte) because it's widely seen as having too big a constant factor to be practical. This radixsort() is a constructive proof that the opposite is true. I ended up sending this code to Berkeley. Keith Bostic cleaned it up, fixed a few bugs, added a shell sort for the small case, and did several helpful optimizations. radixsort() will be part of BSD 4.4. I took back his version and modified it to what you see below. Among other things, I ported it from ANSI C back to the rest of the world, cleaned up some of the comments, added a proof that part of the method actually works, added the radixsort5(), radixsort7(), and radixsort3() variants, restored the original indentation, fixed an overly conservative estimate of the necessary stack size, and plugged a memory leak. */ #define blob unsigned char /* technically, shouldn't be typedefed */ /* KB says: * __rspartition is the cutoff point for a further partitioning instead * of a shellsort. If it changes check __rsshell_increments. Both of * these are exported, as the best values are data dependent. */ #ifndef NPARTITION #define NPARTITION 40 #endif int __rspartition = NPARTITION; int __rsshell_increments[] = { 4, 1, 0, 0, 0, 0, 0, 0 }; /* KB says: * Shellsort (diminishing increment sort) from Data Structures and * Algorithms, Aho, Hopcraft and Ullman, 1983 Edition, page 290; * see also Knuth Vol. 3, page 84. The increments are selected from * formula (8), page 95. Roughly O(N^3/2). */ static void shellsort(p,index,n,tr) register blob **p; register blob *tr; register int index; register int n; { register blob ch; register blob *s1; register blob *s2; register int incr; register int *incrp; register int t1; register int t2; incrp = __rsshell_increments; while (incr = *incrp++) for (t1 = incr;t1 < n;++t1) for (t2 = t1 - incr;t2 >= 0;) { s1 = p[t2] + index; s2 = p[t2 + incr] + index; while ((ch = tr[*s1++]) == tr[*s2] && ch) ++s2; if (ch > tr[*s2]) { s1 = p[t2]; p[t2] = p[t2 + incr]; p[t2 + incr] = s1; t2 -= incr; } else break; } } /* KB says: * Stackp points to context structures, where each structure schedules a * partitioning. Radixsort exits when the stack is empty. * * If the buckets are placed on the stack randomly, the worst case is when * all the buckets but one contain (npartitions + 1) elements and the bucket * pushed on the stack last contains the rest of the elements. In this case, * stack growth is bounded by: * * limit = (nelements / (npartitions + 1)) - 1; * * This is a very large number, 52,377,648 for the maximum 32-bit signed int. * * By forcing the largest bucket to be pushed on the stack first, the worst * case is when all but two buckets each contain (npartitions + 1) elements, * with the remaining elements split equally between the first and last * buckets pushed on the stack. In this case, stack growth is bounded when: * * for (partition_cnt = 0; nelements > npartitions; ++partition_cnt) * nelements = * (nelements - (npartitions + 1) * (nbuckets - 2)) / 2; * * The bound is: * * limit = partition_cnt * (nbuckets - 1); * * This is a much smaller number, 4590 for the maximum 32-bit signed int. Note inserted by DJB: About time I proved this... Fix the number of buckets, b. Any given pile of n elements is split into m stack piles and b - m small-scale piles which immediately disappear. We ignore the case where the pile is split into only one pile of its original size---any pile will be split into smaller piles eventually. Say the stack is left with piles of sizes p_1 ... p_m, each at least P + 1, none equal to n, while x elements, for some x from 0 to m'P, are in small-scale piles. (Here P is the cutoff.) We must have p_1 + ... + p_m + x = n. Depending on the other (p,x) constraints chosen, we define s(n) as the maximum stack size for n elements. As the subpile distributions are independent, clearly s(n) = max_m max_(p,x) max {s(p_1),s(p_2) + 1,...,s(p_m) + m - 1} over all valid m and (p,x). In particular, if m > 0 then we must have n > p_1 >= P + 1, so if n <= P + 1 then the maximum is (vacuously) 0. So s(n) is monotone increasing for n <= P + 1. An easy induction shows that s(n) is in fact 0 for all n < 2P + 2. Clearly s(n) is monotone: for n >= P + 2 we choose m = 1, p_1 = n - 1, and x = 0, and we see s(n) >= s(p_1) = s(n - 1). For m = 0, the maximum always equals 0, and for m = 1, the maximum always equals a previous s(p_1), so we have s(n) = max { max_{k<n} s(k) , max_{m>=2} max_(p,x) max s(p_j) + j - 1 }. For convenience we define t(n) = max_{m>=2} max_(p,x) max s(p_j) + j - 1. Fix n. Fix m >= 2. Consider a (p,x) achieving the maximum value of max s(p_j) + j - 1. Since p_1 >= P + 1, we have p_2 <= n - P - 1. If x isn't 0, we can move an element from one of the small-scale piles to stack pile p_2, under either of the constraints in question. This increases p_2---hence does not decrease s(p_2)---without affecting the other s(p_j). Hence there is a (p,x) with smaller x also achieving the maximum. So consider a (p,0) achieving the maximum, and say max s(p_j) + j - 1 is achieved at j = i. If we exchange p_i and p_j while meeting the constraints, we must not be at a higher maximum; in particular, s(p_i) + j - 1 <= s(p_i) + i - 1, so j <= i. Restrict attention the the case without constraints, NC. The choice of j is unrestricted, so in particular m <= i and hence i = m. Thus t(n) equals max_{m>=2} s(p_m) + m - 1. Since all p_j are at least P + 1, we have p_m + (P + 1)(m - 1) <= p_m + ... + p_1 = n p_m <= n - (P + 1)(m - 1) s(p_m) <= s(n - (P + 1)(m - 1)) t(n) <= max_{m>=2} s(n - (P + 1)(m - 1)) + m - 1 (NC), and it is easy to see that for n >= (P + 1)m, we can choose p_m as n - (P + 1)(m - 1) >= P + 1 and all other p_j = P + 1, so that the bound is achieved: t(n) = max_{m>=2} s(n - (P + 1)(m - 1)) + m - 1 for n/(P + 1) >= m. (NC) For 2 <= n/(P + 1) < b, we can choose m anywhere between 2 and f = floor(n/(P + 1)) inclusive. Now s(n) = max { max s(k), max_{2<=m<f} s(n - (P + 1)(m - 1)) + m - 1 } (NC) We claim inductively that s(n) = f - 1 for any n with floor(n/(P + 1)) = f. This is true for f = 1. For larger f, s(n - (P + 1)(m - 1)) = floor((n - (P + 1)(m - 1))/(P + 1)) - 1 = f - (m - 1) - 1 = f - m, so that s(n) = max { max s(k), f - 1 } (NC) If n = (P + 1)f, then by inductive hypothesis s(k) <= f - 2, so that s(n) = f - 1 as desired. Otherwise, by (sub)induction on n, s(k) is still bounded by f - 1, so that s(n) = f - 1 always. Therefore: s(n) = floor(n/(P + 1)) - 1, n >= P + 1. (NC) Now consider the push-largest-pile-first constraint, FC. This requires that p_1 >= p_j for all j. Hence we cannot swap p_1 with p_i. However, if i is not 1 then we can swap p_j with p_i for all j > 1, hence j <= i for all j between 2 and m, hence i is m. Thus the maximum is achieved either at i = 1 or at i = m, and we have t(n) = max_{m>=2} max_(p,0) max { s(p_m) + m - 1, s(p_1) }. (FC) Take a p vector achieving the maximum of max { s(p_m) + m - 1, s(p_1) }, with m fixed. Suppose some p_j for j between 2 and m - 1 inclusive is larger than P + 1. (This is vacuous for m = 2.) Then we can subtract one from p_j and add one to p_1, preserving the constraint and not decreasing the maximum. Hence the maximum is achieved somewhere with all p_j = P + 1 for 2 <= j < m, i.e., p_1 + p_m + (P + 1)(m - 2) = n. (FC) Furthermore, p_1 >= p_m. Hence 2p_1 >= n - (P + 1)(m - 2), and p_1 can range from ceiling((n - (P + 1)(m - 2))/2) up to n - (P + 1). Similarly, 2p_m <= n - (P + 1)(m - 2), so p_m can range from P + 1 up to floor((n - (P + 1)(m - 2))/2). The global maximum of these quantities simultaneously equals the global maximum of them individually, so if all bounds can be achieved then t(n) = max_{m>=2} max { s(n - (P + 1)), s(floor((n - (P + 1)(m - 2))/2)) + m - 1 }. (FC) This can be achieved if n - (P + 1) >= ceiling((n - (P + 1)(m - 2))/2) and, equivalently, P + 1 <= floor((n - (P + 1)(m - 2))/2), since in that case we can choose both p_1 and p_m as stated. These reduce after some simple manipulation to n >= (P + 1)m, i.e., m <= floor(n/(P + 1)). For other m it is not possible to choose any valid p_i (exercise). We'd like to show inductively that for all n >= 2P + 2 we have s(n) = u(n) with u(n) = max_{m>=2} s(floor((n - (P + 1)(m - 2))/2)) + m - 1. (FC,*) To do this we need only show that the other terms of the maximum do not ``get in the way,'' i.e., that u(n) >= s(n - (P + 1)) and u(n) >= s(k) for k < n. The second half is easy: s(k) = u(k), which is at most u(n) by monotonicity of s. The first half also follows from the induction: s(n) = max_m s(floor((n - (P + 1)(m - 2))/2)) + m - 1 s(n - (P + 1)) = u(n - (P + 1)) = max_{m>=2} s(floor((n - (P + 1) - (P + 1)(m - 2))/2)) + m - 1 <= max_{m>=2} s(floor((n - (P + 1)(m - 2))/2)) + m - 1 = u(n) (FC) again by the monotonicity of s. This proves (FC,*). For small n, i.e., floor(n/(P + 1)) = f with 2 <= f <= b, we can choose m = f, so s(n) is at least s(floor((n - (P + 1)(f - 2))/2)) + f - 1. The floor term is at most P + 1, so s(n) >= f - 1. Furthermore, s in the constrained case is at most s in the unconstrained case, so s(n) = f - 1. For larger n, by similar logic, the maximum is attained at m = b, so we finally have s(n) = floor(n/(P + 1)) - 1 for n < (b + 1)(P + 1) s(n) = s(floor((n - (P + 1)(b - 2))/2)) + b - 1 otherwise (FC) As in the first case s(n) is always bounded by b - 1, we can calculate a bound on s(n) by repeatedly setting n to floor(n - (P + 1)(b - 2))/2) until it is under 2P + 2 (so that s(n) = 0), counting the number of iterations necessary, and multiplying by b - 1. And that's what we wanted to prove. */ #ifndef UCHAR_MAX /* XXX: we aren't even giving a chance for a definition! */ #define UCHAR_MAX 256 #endif #define NBUCKETS (UCHAR_MAX + 1) typedef struct { blob **bot; int index; int n; } context; #define STACKPUSH { \ stackp->bot = p; \ stackp->n = n; \ stackp->index = index; \ ++stackp; \ } #define STACKPOP { \ if (stackp == stack) \ break; \ --stackp; \ bot = stackp->bot; \ n = stackp->n; \ index = stackp->index; \ } /* KB says: * This uses a simple sort as soon as a bucket crosses a cutoff point, * rather than sorting the entire list after partitioning is finished. * This should be an advantage. Note from DJB: The original comment read ``There is no strong evidence that this is an advantage.'' Depressing. Here's what I wrote in response: Of course it's an advantage: it has to be, I coded it that way. :-) Seriously, I just coded the sort that way since I was following Knuth's description of MSD to the hilt. As you can imagine, though, doing the sort this way saves just a bit of paging of the index array. It also means that the simple sort doesn't have to worry about crossing past already-determined boundaries---for an average 2x gain. Trust me, it's an advantage. */ int radixsort7(l1,n,endchar,tab,indexstart,ralloc,rfree) blob **l1; register int n; unsigned int endchar; /* could use blob, but chars are unsafe with prototypes */ blob *tab; int indexstart; char *(*ralloc)(); void (*rfree)(); { register int i; register int index; register int t1; register int t2; register blob **l2; register blob **p; register blob **bot; register blob *tr; context *stack; context *stackp; static int c[NBUCKETS + 1]; static int *ctr[NBUCKETS + 1]; int max; blob ltab[NBUCKETS]; /* local (default) table */ if (n <= 1) return 0; /* Allocate the stack. */ t1 = (__rspartition + 1) * (NBUCKETS - 2); for (i = 0,t2 = n;t2 > __rspartition;i += NBUCKETS - 1) t2 = (t2 - t1)/2; /* could go negative! but that's okay */ if (!i) stack = stackp = 0; else if (!(stack = stackp = (context *)ralloc(i * sizeof(context)))) return -1; /* KB says: * There are two arrays, one provided by the user (l1), and the * temporary one (l2). The data is sorted to the temporary stack, * and then copied back. The speedup of using index to determine * which stack the data is on and simply swapping stacks back and * forth, thus avoiding the copy every iteration, turns out to not * be any faster than the current implementation. */ if (!(l2 = (blob **)ralloc(n * sizeof(blob *)))) { rfree((char *)stackp); return -1; } /* KB says: * tr references a table of sort weights; multiple entries may * map to the same weight; EOS char must have the lowest weight. */ if (tab) tr = tab; else { t2 = endchar; for (t1 = 0;t1 < t2;++t1) ltab[t1] = t1 + 1; ltab[t2] = 0; for (t1 = t2 + 1;t1 < NBUCKETS;++t1) ltab[t1] = t1; tr = ltab; } for (t1 = 0;t1 < NBUCKETS;++t1) ctr[t1] = c + tr[t1]; /* First sort is entire pile. */ bot = l1; index = indexstart; for (;;) { /* Clear the bucket count array. XXX: This isn't portable to */ /* machines where the byte representation of int 0 isn't all */ /* zeros. :-) */ bzero((char *)c,sizeof(c)); /* Compute number of items that sort to the same bucket for this index. */ p = bot; i = n; while (--i >= 0) ++*ctr[(*p++)[index]]; /* KB says: * Sum the number of characters into c, dividing the temp stack * into the right number of buckets for this bucket, this index. * c contains the cumulative total of keys before and included in * this bucket, and will later be used as an index to the bucket. * c[NBUCKETS] contains the total number of elements, for determining * how many elements the last bucket contains. At the same time, we * find the largest bucket so it gets pushed first. */ t2 = __rspartition; max = t1 = 0; for (i = 0;i <= NBUCKETS;++i) { if (c[i] > t2) { t2 = c[i]; max = i; } t1 = c[i] += t1; } /* Partition the elements into buckets; c decrements through the */ /* bucket, and ends up pointing to the first element of the bucket. */ i = n; while (--i >= 0) { --p; l2[--*ctr[(*p)[index]]] = *p; } /* Copy the partitioned elements back to the user stack. */ bcopy(l2,bot,n * sizeof(blob *)); ++index; /* KB says: * Sort buckets as necessary; don't sort c[0], it's the * EOS character bucket, and nothing can follow EOS. */ for (i = max;i;--i) { if ((n = c[i + 1] - (t1 = c[i])) < 2) continue; p = bot + t1; if (n > __rspartition) STACKPUSH else shellsort(p,index,n,tr); } for (i = max + 1;i < NBUCKETS;++i) { if ((n = c[i + 1] - (t1 = c[i])) < 2) continue; p = bot + t1; if (n > __rspartition) STACKPUSH else shellsort(p,index,n,tr); } /* Break out when stack is empty */ STACKPOP } rfree((char *)l2); rfree((char *)stack); return 0; } extern char *malloc(); extern void free(); int radixsort5(l1,n,endchar,tab,indexstart) blob **l1; register int n; unsigned int endchar; blob *tab; int indexstart; { return radixsort7(l1,n,endchar,tab,indexstart,malloc,free); } int radixsort4(l1,n,endchar,tab) blob **l1; register int n; unsigned int endchar; blob *tab; { return radixsort7(l1,n,endchar,tab,0,malloc,free); } int radixsort(l1,n,tab,endchar) blob **l1; register int n; blob *tab; unsigned int endchar; { return radixsort7(l1,n,endchar,tab,0,malloc,free); } int radixsort3(l1,n,endchar) blob **l1; register int n; unsigned int endchar; { return radixsort7(l1,n,endchar,(blob *) 0,0,malloc,free); } /*- BSD copyright says: * Copyright (c) 1990 The Regents of the University of California. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /* BSD sccs says: static char sccsid[] = "@(#)radixsort.c 5.7 (Berkeley) 2/23/91"; but this is (heavily) modified. */