home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.theory:2836 sci.crypt:6551 sci.math:17858 rec.puzzles:8292
- Newsgroups: comp.theory,sci.crypt,sci.math,rec.puzzles
- Path: sparky!uunet!stanford.edu!CSD-NewsHost.Stanford.EDU!Sunburn.Stanford.EDU!pratt
- From: pratt@Sunburn.Stanford.EDU (Vaughan R. Pratt)
- Subject: Re: Looking for random permutation generation algorithms
- Message-ID: <1993Jan8.210013.24614@CSD-NewsHost.Stanford.EDU>
- Sender: news@CSD-NewsHost.Stanford.EDU
- Organization: Computer Science Department, Stanford University.
- References: <1993Jan6.014749.15323@ee.ubc.ca>
- Date: Fri, 8 Jan 1993 21:00:13 GMT
- Lines: 67
-
- In article <1993Jan6.014749.15323@ee.ubc.ca> rayw@ee.ubc.ca (raymond w m woo) writes:
- >Hi, does anyone know, or can provide any pointer in the literature to, any
- >random permutation generation algorithm that can be easily implemented as a
- >function in a computer program?
-
- The more general problems of efficiently generating random combinations
- (sampling M out of N items without replacement, order immaterial) and
- permutations (ditto except that order now matters) are treated in Jon
- Bentley's "Programming Pearls" column in CACM 30:9 (Sept.87), 754-757.
- Bentley gives Bob Floyd's solutions to these two problems, and poses 8
- problems about Floyd's method whose solutions appear in the following
- December issue starting on p.999.
-
- Floyd's method is so nice that it bears repeating here.
-
- Floyd's method for sampling combinations, i.e. outputting a subset of M
- integers from among the first N positive integers, is as follows. For
- each integer J from N-M+1 to N, choose an integer at random from the
- interval 1..J. (So altogether we choose M integers from an interval
- starting at 1 that increases in length by 1 after each choice, with the
- final choice being made from the whole interval 1..N.) For each such
- choice K, if K has not already been chosen previously then output K,
- call this a *virgin* output, otherwise output the current value of J
- instead of K, call this a *default* output. The *set* (not list) of M
- integers so output then constitutes the random sample.
-
- Floyd's method for sampling permutations, where the order matters, is
- the same algorithm but with the output now treated as a list instead of
- a set. Virgin outputs go on the end of the list, default outputs are
- inserted on the list just in front of the occurrence of K responsible
- for making this a default output.
-
- It is easily seen that the output of the permutation algorithm is one
- of the N!/(N-M)! possible permutations of M integers out of N. What
- remains to show for this algorithm is not only that every permutation
- can occur, but that they do so with equal probability. Doug McIlroy's
- nice proof of this runs as follows.
-
- The output list determines the sequence of random numbers produced by
- the generator (nice exercise, Bentley's Problem 4). But there are
- N!/(N-M)! possible sequences produced by the random number generator,
- whence all N!/(N-M)! permutations must be producible. Assuming the
- generator is truly random, these sequences are equiprobable, whence so
- are the corresponding permutations.
-
- The correctness of Floyd's method for sampling combinations now follows
- immediately from the observation that each combination is represented
- by M! permutations, whence the N choose M combinations are also
- equiprobable.
-
- As noted in Bentley's December 1987 column, an alternative to Floyd's
- algorithm is to initialize an array a[1..N] to a[i]=i and then scan
- along from a[1] to a[M], exchanging each such a[i] with a randomly
- selected a[j] for j >= i.
-
- For N = M (Woo's question) this method is the one Jim Roth gave. It is
- preferable to Floyd's method because it does not entail testing for
- previous occurrences.
-
- But for N >> M it has two drawbacks: the time to initialize the array,
- and the space required for it, both being O(N). In contrast Floyd's
- method requires space O(M) and time somewhere between O(M) and O(M^2)
- depending on the method used to test for repetitions. (For N some
- modest multiple of M one might instead keep track of repetitions using
- an array of N bits, i.e. N/8 bytes, initialized to 0.)
- --
- Vaughan Pratt There's safety in certain numbers.
-