home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.c
- Path: sparky!uunet!snorkelwacker.mit.edu!bloom-picayune.mit.edu!news
- From: scs@adam.mit.edu (Steve Summit)
- Subject: Re: Random Number Generator
- Message-ID: <1992Nov6.223901.11603@athena.mit.edu>
- Sender: news@athena.mit.edu (News system)
- Nntp-Posting-Host: adam.mit.edu
- Organization: none, at the moment
- References: <11781@hq.hq.af.mil> <1992Oct22.181426.59@front.se> <26939@dog.ee.lbl.gov>
- Date: Fri, 6 Nov 1992 22:39:01 GMT
- Lines: 123
-
- In article <26939@dog.ee.lbl.gov>, torek@horse.ee.lbl.gov (Chris Torek) writes:
- > In article <1992Oct22.181426.59@front.se> per@front.se writes:
- >> ... The rand() function in the VMS C RTL returns nubers which are
- >> alternating even and odd ...
- >
- > Time for another Netnews Rerun....
- >
- >> In article <1991Dec4.203704.21200@elroy.jpl.nasa.gov>
- >> russb@classy.nasa.gov (Russ Brill) writes:
- >>> When I repeatedly call rand(), I get even, odd, even, odd,...
- >
- > In article <1991Dec4.214242.15508@klaava.Helsinki.FI>
- > torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds) writes:
- >> This is a known problem with many (usually older) implementations of
- >> rand().
- >
- > In fact, the sample implementation included in the ANSI C standard
- > does the same.
-
- Chris may be thinking of an earlier draft. The example
- implementation in the published ANSI standard (ANSI X3.159-1989,
- sec. 4.10.2.2, p. 155) is not a pure linear congruential random
- number generator; rather, it is a LCRNG with a 32-bit (or more)
- state value which returns 15 bits from the *top* (most
- significant) half. Therefore, its actual return values are
- *not* subject to these particularly unfortunate regularities.
- (To recap: the low-order N bits of a pure LCRNG repeat with
- period 2-to-the-N.)
-
- (The V7, PDP-11 Unix rand() used the same return-the-higher-
- order-bits technique; tragically, someone blew it when porting
- the code to the 32-bit VAX and changed the code to return all 32
- bits of the state variable, and programmers using BSD have been
- plagued by the problem ever since [note 1].)
-
- > Not everyone considers it a `problem' so much as a
- > `limitation'. If you want *really* random numbers, you must not use
- > a pseudo-random algorithm; if you want approximately-random numbers
- > you must decide what for and consult algorithm books, or you will
- > get trapped by something like this.
-
- For what it's worth, I find this advice unnecessarily harsh, and
- I do consider RNG's with regularities in their low-order bits a
- problem. It doesn't matter what I think, though: the heavy
- hitters in the algorithms world have declared that since there
- are so many poor RNG's out there (which there are), programmers
- must never do things like
-
- rand() % N
-
- , but instead must use the equivalent of
-
- rand / (RAND_MAX / N)
-
- , which isn't much harder, as long as you know RAND_MAX (which
- hasn't always been #defined in a standard way for C).
-
- In other words, the prevailing advice is to consign rand() % N to
- the same dustbin that gets() has been demoted to: the code is
- there, and it is theoretically legal to use it, but you mustn't.
-
- This advice is so widespread, and is assumed (here's the fallacy)
- to be so well-known, that there is apparently no incentive for a
- vendor to fix a rand() implementation which is deficient in this
- way. A few months ago, sympathetic to the plight of the
- umpteenth programmer who had been bitten by this problem, I
- submitted a "formal" bug report to comp.bugs.4bsd, where I was
- met with the (barely) polite response that "everyone knows not to
- take rand() % N, so it's not a problem." A classic case of
- blaming the victim, and of blaming the user for a software
- deficiency.
-
- It's quite true that there are more ways to go astray with a
- pseudo-random number generator than to take the values of a
- poorly-written one modulo small powers of 2. (You can also run
- into problems if your range is not small with respect to, and
- does not evenly divide, the period of the RNG. [Note 2.])
- I happen to think that the rand() % N problem is so common, and is
- so easily made by the unwary, that it's worth fixing [note 3];
- but again, it doesn't matter what I think. The deficient RNG's
- are out there, and they're out there to stay (never mind that
- we've known for decades how to write better ones), and cautious
- programmers mustn't take rand() % N.
-
- Steve Summit
- scs@adam.mit.edu
-
- Note 1. Yes, I know about BSD's random().
-
- Note 2. There are more subtle problems which can arise when
- using RNG's; in fact, you can prove that for any pseudo-random
- number generator, no matter how "good" it is, there is an
- application for which it is seriously deficient if not useless.
- I don't recall the details of the proof off the top of my head,
- but here is an example which is suggestive. Consider a very
- simple rand(), with a RAND_MAX of 7 and a period of 8, which
- returns on successive calls the numbers 0, 1, 4, 2, 7, 6, 3, 5,
- and then repeats. (Looks pretty random, right?) Consider a
- program which, for whatever reason, takes one value from rand(),
- then takes one value and discards one value (i.e. calls rand()
- and ignores the result), then takes one value and discards two
- values, then takes one and discards three, and so on until it has
- taken a value and discarded 6 values, at which point it takes one
- last value. (Try it.) This is a contrived example, but the
- point is that for any random number generator which ever repeats
- itself or has any hidden, underlying order (and all pseudo-random
- number have some hidden order somewhere, since they are
- deterministic processes), there exist calling sequences
- which will beat against the underlying order and produce strongly
- ordered output.
-
- Note 3. You can argue that vendors do application programmers a
- favor by supplying rand()'s which are orderly in their low-order
- bits, so that the programmers immediately "fix" their code and
- are not lulled by a decent rand() into a false sense of security,
- only to have it dashed when the code is ported to a machine with
- a deficient version. Sometimes I agree with this argument;
- somehow, in this case, I don't. (There was an airline pilot who
- used to test new co-pilots by deliberately mis-setting some
- control shortly before takeoff; people weren't too happy when
- this practice was discovered. I shouldn't mention this, because
- it'll probably cloud the issue, and I'm not even sure I disagree
- with the captain in question.)
-