home *** CD-ROM | disk | FTP | other *** search
-
- The following information is intended for distribution over Internet,
- and outside of that may be copied for personal use only.
- (c) Glen L. Barnett, 1992. All rights reserved.
-
- I recently responded to a thread on rec.games.frp.dnd about testing dice,
- in order to fix up a few misconceptions and make some suggestions. On the
- suggestion of Coyt Watters I'm putting this on r.g.f.archives, which I think
- is a good idea. What follows has some major additions, however.
-
- In this post, I will begin by discussing various posts on the subject
- of testing dice, then show how to do the test under discussion in the
- thread (the chi-squared goodness-of-fit test) properly, and then talk
- about more appropriate tests.
-
- ---------------------------------------------------------------------
-
- In article <BxHJLy.Jq0@watserv1.uwaterloo.ca>,
- alongley@cape.UWaterloo.ca (Allan Longley) writes:
- [..stuff deleted..]
- >testing dice for bias. Well, here is a test to use. I haven't actually
- >tested this yet, but it should -- in theory -- work. And no, this is not
- >a copy out of the old Dragon magazine, but it is the same test -- its a
- >pretty standard test. I will use simple terms -- so all you math/stat
- >people out there, don't correct the fine points, I know.
- [description of chi-squared goodness-of-fit test deleted]
-
- Since Allan asks for no correction of fine points, I will attempt to
- limit myself to major problems. This is not intended as a flame on
- Allan, but this is fairly important stuff, and should be explained
- correctly. If at any stage I get less than pleasant, please accept
- my apology in advance.
-
- While the calculations that Allan describes give the correct value
- of a chi-square goodness-of-fit statistic (which he calls "Indicator"),
- you should be *very* wary of interpreting the results in the way
- he describes, as I will explain:
-
- Let us assume you have 40 dice that (unknown to you) are all perfectly
- fair, and you wish to test all of them, to see if any are "biased".
-
- The way Allan has set his test up, you'd expect 2 of them to give
- results below "Probably Fair", which he says indicates the die is
- probably unbiased. That is, you have 40 fair dice, and you will
- expect to regard only *two* of them as probably O.K.! Similarly,
- you will expect to consider two of your "purely fair" dice as
- probably unfair. Of the remaining 36, you will expect 18 scores
- between "Probably Fair" and "Maybe" and 18 more between "Maybe"
- and "Probably biased". For these 36, you have to do the test again,
- under Allan's scheme. If you get both results below "Maybe" (you expect
- 9 of these) you say "Probably Fair". Similarly you expect 9 above
- "Maybe" on both trials.
-
-
- So we have (after repeating the test for 90% of the dice):
-
- Number of Fair Dice: 40
- Expected number "probably fair": 11
- Expected number "probably biased": 11
- Expected number which we don't know about: 18.
-
- So over a quarter of perfectly fair dice will be called "probably biased".
- If we continue testing those remaining 18 we are still undecided about,
- the problem gets worse.
-
-
- Other problems:
-
- Allan says:
-
- "The column titled "Maybe" are the Indicator values where there is
- a 50% chance that the die is fair and a 50% chance that the die is
- biased." (A)
-
- This is just plain wrong.
-
- The column he refers to is the value that a test on a *fair* die will
- exceed 50% of the time. This is very different, and probably explains
- why Allan misunderstands the whole interpretation of the results. (B)
-
- If any of you can't see why what I said (B), and what Allan said (A) are
- totally different, don't despair. This stuff is not always obvious from
- the start. If you can follow the rules of an average RPG, you are smart
- enough to understand a few non-trivial statistical ideas. I'm quite happy
- to provide further clarification to the net if the demand is there.
-
- > >From Table 1, it appears that the d4 tested may be "Fair" but another test
- > should be done.
-
- I'd say not. A reasonable interpretation of the result is "There is
- no reason to doubt that the die is O.K.".
-
- Incidentally, the test statistic of 2.00 obtained in the example is
- only 2/3 of what you'd expect with a fair die. The value of 2.00 will
- be exceeded almost 60% of the time by a test on a fair die.
-
-
- ---------------------------------------------------------------------
-
- In article <Bxu26A.3F6@watserv1.uwaterloo.ca>,
- alongley@cape.UWaterloo.ca (Allan Longley), in response
- to Michael Wright, says:
-
- |In article <wright.721879981@latcs1.lat.oz.au|wright@latcs2.lat.oz.au
- | (Michael G. Wright) writes:
- |>dks@acpub.duke.edu writes:
- |>
- |>> This is called a chi-square test, and an article with the
- |>>procedure and numbers for it appeared way back in Dragon issue
- |>>#74... Thank you, Mr. Longley, for reposting it (or did you come
- |>>up with it in isolation? =) ) for the benefit of those who don't
- |>>have the issue (probably most readers).
- |
- |Yes, I seen the issue. The chi-square test is a standard statistical test
- |for determinig if a data set matches a particular distribution -- so, no, I
- |did not come up with the test in isolation, its been around for a lot longer
- |than D&D. I don't actually have the issue, so I didn't copy it for the net.
- |
- |I've been playing with the chi-square test and you know what I found out --
- |ALL DICE ARE BIASED!! Well, that's not true -- all except d4's and d6's are
- |biased. Of course, this really shouldn't be a surprise. So, I've been
- |looking at modifiying the chi-square test for "real world" dice -- more on
- |this in a later post.
-
- Allan is correct that the test has been around a lot longer than D&D.
- The test, due to Karl Pearson, is nearly 100 years old.
-
- Its no surprise that Allan finds that all real dice are biased:
- i) Its impossible to make a truly fair die (obviously). Its just
- that most are close enough that we don't care too much. The
- chance of getting "close to fair" will decrease with the number
- of sides.
-
- ii) Allan's testing method will call more than a quarter of fair
- dice (assuming they existed) biased. Even the fairest dice you
- could buy have a good chance of being called biased.
-
-
- |>Actually, I use a program I made to roll dice for stats. Unfortunately, nobody
- |>in the party wants to use it, because dice rolls invariably end up better. I
- |>think this must be because of the pseudo-randomness of the program.
- |>Anyone out there that knows better?
- |
- |I wouldn't want to use a computer generated die-value while playing AD&D.
- |THe thrill of the "rolling die" is part of the game. Also, with reference
- |to the above, most players will have a favourite die/dice due to the
- |inherent bias found in real dice. The trick is to find the dice that are
- |biased beyond reasonable playability.
-
- In response to Michael G Wright:
-
- Michael's discovery that hand-rolled dice often come out better
- could occur for a couple of reasons:
-
- i) Players tend to hang on to "favourite" dice that "roll well"
- (i.e. come up with good results). So biased dice have some
- chance of concentrating into the hands of players. As long as
- this isn't too extreme, it probably doesn't matter too much.
- (Allan quite correctly identifies this reason).
-
- ii) Players don't really roll randomly. I had a fairly long email
- discussion with Sea Wasp on this topic just recently. Even
- unconciously, you can pick up the dice in a "non-random" fashion,
- so that a good roll will tend to be followed by another good
- roll if you don't roll tooo vigorously. You may notice that
- after a bad roll players tend to throw harder. This may be more
- of a problem, as some players are *much* better at it than others.
- If it becomes too noticeable, you may wish to invest in a dice
- cup, or mock up a craps-table affair.
-
- Allan's comment that "The trick is to find the dice that are biased
- beyond reasonable playability" is spot on. Exactly correct.
- Remember it, because I'll come back to it later.
-
-
- I think the above discussion also answers Paul Kinsler's questions.
-
- [in article <BxvuzM.CsI@bunyip.cc.uq.oz.au>, kinsler@physics.uq.oz.au
- (Paul Kinsler) asked for clarification of Allan's comment that all
- dice are biased].
-
- ---------------------------------------------------------------------
-
- In article <1992Nov12.194015.1602@Princeton.EDU>,
- dagolden@phoenix.Princeton.EDU (David Alexandre Golden) writes:
-
- [stuff deleted]
- >I once did something along those lines to test whether my DM's die was
- >fair. (It would roll 20's a lot. Often on command.)
- >
- >To test the die, I made a histogram (I believe is the term for it) like this:
- >
- >x x x x x
- >x x x x x x x x x x x x x x x x x x x x
- >x x x x x x x x x x x x x x x x x x x x
- >1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
- >
- >With an "x" being each time that number came up. A totally fair die would
- >have a straight line across, assuming it was rolled enough times. The
- >question is, what is enough times? I rolled a d20 about 380 times (not bad
- >if one person rolls and the other makes an x in the column) and while it
- >gave a fairly good idea that the die was biased, the biased numbers had only
- >occured a couple times more than the average. (If I remember correctly).
- >So I wrote a computer program to do the same thing until the deviation between
- >the highest and lowest number of occurances was less than about 15% of the
- >average number of occurances. (i.e a reasonably smooth profile). The
- >computer required SEVERAL THOUSAND ROLLS to do this.
-
- Your idea of making a histogram is a good one. In fact all the
- chi-squared test does is the same as drawing a line across the
- histogram where your expected "straight line across" would go,
- looking at the deviations from that, squaring (to get all positives)
- and adding the squared deviations up and dividing by that expected
- number. This gives a single overall measure of deviation from uniformity.
- The advantage of looking at the histogram is you see where
- the the differences are, but you can't tell how big they "ought" to be
- for a fair die. The actual number in the cell will be approximately normally
- distributed with mean equal to the expected number in each cell and standard
- deviation approximately the square root of the mean. In the above example,
- we'd expect Dave to get 19 in each cell, so the standard deviation is about
- 4.35. That is, we'd expect to get about 2/3 of the cells with counts in
- the range 15 to 23, and about a 2/3 chance of all but 1 or 2 of the values
- inside the range 11 to 27.
-
- [ some of my own discussion deleted - see the section "Tests based
- on the histogram" below]
-
- > ... Still, the point is that I'm skeptical
- >that the "fairness" of a die can be determined in only a hundred or so rolls.
- >(d4 maybe... d20 no way!)
-
- Well, in fact Allan's suggestion was to use 20 rolls per cell, so he'd use
- 400 rolls for a d20 and 80 rolls for a d4. But in any case you can never
- decide that a die is actually fair. If you do a test and get a result
- close to what you expected if the die was fair, you have a lack of evidence
- against the hypothesis of fairness (which is the default assumption for
- a statistical test of biasedness in a die - the "null hypothesis").
-
- What you get is either a higher degree of evidence against the hypothesis
- of fairness (by getting a result that is very unlikely with a fair die),
- or a low degree of evidence against fairness. It's like in a court case,
- (a criminal case), where the defendant is assumed innocent until proven
- guilty (innocence is the null hypothesis), but evidence against the
- defendant is presented by the prosecution. The jury then decides either
- "guilty" if there is strong enough evidence, or "Not guilty" if there
- is not. They don't declare innocence.
-
- So we can't determine "fairness" anyway. The question we need to ask
- is: If the die is biased, will a hundred rolls (or whatever number)
- be enough for us to have a good chance to pick up that difference,
- while at the same time, not "convicting the innocent" too often?
- Whether it is enough depends on how big a difference you think it is
- important to pick up.
-
- -----------------------------------------------------------------------
-
-
- In article <1992Nov12.231019.1204@mcs.kent.edu>,
- adray@mcs.kent.edu (Adam Dray) writes (in response to Dave Golden):
-
- >In other words, a histogram shows very little. Random doesn't mean
- >necessarily that you'll get an even distribution. It just mean the
- >probability that you won't get an even distribution is proportional to
- >the number of sides on the die, and the number of times you roll it.
-
- I disgree with the first sentence. The final sentence above is wrong.
- The more you roll it, the more even the distribution will be, as long
- as the die is fair. It doesn't really depend on the number of sides.
- (except as far as the negative dependence between cell counts is
- reduced for more sides).
-
- >Notes about the fairness of dice:
- >
- >Sharp-edged dice are better than smooth-edged dice. They're also more
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Not always, but this may be true more often than not.
-
- >expensive, however. Rounded dice are often inked by coating the
- >entire die with ink, then tossing the die in a "tumbler" (similar to
- >tumblers for smoothing rocks) until all the die on the outside is
- >gone. Thus, the ink is left in the crevices where the numbers are.
- >
- >Theoretically, the grooves for the numbers can make one side a more
- >likely outcome. Official casino dice don't have inset pips.
-
- If you do it right any effect will be swamped by other manufacturing
- defects anyway.
-
- [some stuff deleted]
- >
- >GameScience did tests on other manufacturers' dice. They found
- >certain numbers to be more likely. I've heard that the real 100-sided
- >die tends to roll certain numbers more often.
-
- It is impossible (both effectively and theoretically) to get a fair 100-
- sided die. The practical problems are more important than the theoretical
- ones.
-
- >Filing corners off your dice can make certain outcomes more probable.
- >Natural wear can do the same thing.
- >
- >For most people, none of this matters one damn bit. =)
-
- In general, no, it doesn't matter. It's encouraging to see that so
- many people (just about all posters on the topic) realise this.
-
- ---------------------------------------------------------------------------
-
- Interpreting the test statistic (Allan's "Indicator")
-
- Carry out the calculations as described by Allan*, but use any number of
- throws per cell (possible outcome) you like (I'd suggest 10 as a minimum,
- because otherwise the tabled distribution is out a bit). The more rolls
- you do, the better chance you have of picking up a difference of a given
- size. The value of 20 that Allan suggested may well be a reasonable choice
- in most circumstances. Allan gives the calculations for two different
- numbers of throws (20 and 10 per cell, but in different posts), so you
- ought to be able to generalise.
-
- * The calculations given by Allan may no longer be available to you, so
- an indication of how to do the calculations is given here:
-
- Roll the die many times, say 20 times per face. Record each result
- (I suggest you make up a tally sheet). Calculate the difference between
- the number of times each face came up and the expected number (20 in this
- case). Square these values and add them. Divide by the expected number
- per face. This is your chi-squared statistic.
- E.g. d4: Roll 20 times per face = 80 rolls
- Face: 1 2 3 4
- No times 23 18 15 24
- expected 20 20 20 20
- difference 3 2 5 4
- diff^2 9 4 25 16 Sum = 54, chi-squared value = 54/20 = 2.7
-
-
- If the result is less than the final column of Allan's table 2 (which
- are the tabulated values for a 5% significance level), you shouldn't
- worry too much, there is not very strong evidence of bias - in fact 1
- in 20 tests on a fair die will score worse than this. If the result is
- much bigger than the value you have some cause for concern. A result
- bigger than the 1% column below is quite unusual if the die is fair
- (a result at least this big only occuring 1% of the time), so it gives
- us good reason to suspect bias.
-
- A small table of the chi-squared distribution:
-
- 5% 1% df
- d4 7.81 11.34 3
- d6 11.07 15.09 5
- d8 14.07 18.48 7
- d10 16.92 21.67 9
- d12 19.68 24.72 11
- d20 30.14 36.19 19
-
- (these results came from a computer approximation to the chi-squared
- distribution. They should be accurate to the figures given.)
-
- If you want a more "cookbook" approach; if the result exceeds the 1%
- value, its probably biased. If its between the 1% and 5% values, there
- is a moderate degree of evidence that its biased, but it still might be
- OK. If its less than the 5% value, you don't have any reason to think
- its biased on the basis of the test.
-
-
- You will find more extensive tables in most elementary statistics books.
- (references for the chi-squared and Kolmogorov-Smirnov tests are
- at the end of this article).
- You look up the df (degress-of-freedom) that are one less than the
- number of faces on the die (e.g. d4 -> 3 df).
-
- A note on pronunciation: The Greek letter chi (the capital looks like
- an X, and the lower-case has one of the two crossed lines a bit curly)
- is pronounced with a hard "ch" like Charisma, and the word rhymes with
- pie. Note that mathematical symbols come from *ancient* Greek, so no
- arguments from any modern Greeks please.
-
- This will provide a reasonable all-round test for bias in a die.
-
-
- -------------------------------------------------------------------
-
- Why you probably don't want to do the chi-squared test:
- (at least for d8 and above)
-
- The chi-squared test will pick up any kind of deviation from a purely even
- distribution. However, we are much more worried about some kind of deviations
- than others. For example, I'd be more interested in knowing that "20" came
- up too often on a d20 than knowing "10" came up too often. The first could
- affect play substantially, the second probably only a little. We should use a
- test with a better chance to pick up the kind of deviations from fairness
- that are most important to us (which will trade off with less chance of
- picking up deviations we are less concerned with).
-
- Let us consider a more complete example:
-
- Imagine we have two d20's we'd like to test, and that in fact
- (but unknown to us) they have the following (percentage) probabilities:
-
- (the rows are: Face number
- % prob 1st die
- % prob 2nd die )
-
-
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
- 5 4.5 6 3.5 7 2.5 8 1.5 7 5 5 7 1.5 8 2.5 7 3.5 6 4.5 5
- 1.5 1.5 2.5 2.5 3.5 3.5 4.5 4.5 5 5 5 5 6 6 7 7 7 7 8 8
-
- A fair die would have 5% right across, of course. These 2 dice can be obtained
- from each other by relabelling the faces. The first die will be reasonable in
- play, because, of course, we don't try to roll 'exactly 8' or 'exactly 9',
- but 'less than 8' or 'greater than 11'. The first die is never out by
- more than one twentieth of the required probablility (e.g. probability of
- a 2 or less is 9.5% instead of 10%) in either direction. It has the correct
- average, and almost the correct standard deviation (the difference is tiny).
-
- The second die would be very unbalancing in play: it has about a 2/3 chance
- (66%) of rolling 11 or higher, and a 20 is more than 6 times as likely as
- a 1. The mean is almost 13. The standard deviation is also out, but that's
- relatively unimportant.
-
- The chi-squared will rate them as equally bad!
-
- So a good test should be likely to identify the second die, but we might
- be prepared to sacrifice some of our ability to pick up the first, since
- it will make little practical difference in play. (I said I'd come back to
- this point!)
-
- Note that almost any deviation on a d4 will be important (there are only 4
- different values), and to a lesser extent a d6. I'd stick with the chi-
- squared test on those.
-
- There are many tests that will do what we want. I will present only one
- such test*. (This is not to say that a properly applied chi-squared is
- not good, just that a test more closely tailored to our specific
- question of interest will be even better.)
-
- * two tests if you count "Tests based on histograms", below.
-
- The Kolmogorov-Smirnov test:
-
- Collect data as for the chi-squared test, up to the point where you
- start doing calculations.
-
- That is, lay out like this (you could run down rather than across):
-
-
- Roll: 1 2 3 4 5 ......
- Count: 17 19 22 27 24 ......
- Expected: 20 20 20 20 20 ......
-
- Now add up your counts and expected counts, writing the partial
- totals as you go:
-
- Roll: 1 2 3 4 5 ......
- Count: 17 19 22 27 24 ......
- Expected: 20 20 20 20 20 ......
- Sum Count: 17 36 58 85 109 ......
- Sum Exp: 20 40 60 80 100 ......
-
- Now find the differences (without sign):
-
- Sum Count: 17 36 58 85 109 ......
- Sum Exp: 20 40 60 80 100 ......
- Difference: 3 4 2 5 9 ......
-
- The last difference will be zero, so you don't have to work out the
- final column (I still would as a check).
-
- Divide the largest difference (9 is the largest difference above, for
- the calculations you can see) by the number of rolls you made altogether.
-
- This is your test statistic. Let's call the value D.
-
- You can look it up in most books on nonparametrics, which will
- have tables. However, you would be better to use the table below, for
- reasons I'll discuss in a second.
-
- You multiply D by the square root of the number of rolls
- (equivalently, divide the largest difference by the square root of
- the number of rolls), and compare with:
-
- 5% 1% d#
- 1.08 1.35 4
- 1.10 1.37 6 These values apply pretty well irrespective of
- 1.11 1.38 8 the total number of rolls, but I would use at
- 1.12 1.39 10 least 10 rolls per face.
- 1.12 1.40 12 Note also that these values come from simulation,
- 1.14 1.42 20 and are hence not exact. This doesn't really matter.
-
- and interpret as I suggested for the chi-square test.
-
- You may find the following values in tables:
-
- 5% 1%
- 1.36 1.63 (irrespective of the number of sides on the die)
-
- the reason these are larger is that they are based on the assumption that
- the distribution the data are from is continuous (effectively, a *very*
- large number of faces on the die would give these values). If you use the
- textbook values, the test will be conservative (a fair die will reject
- slightly less often than the supposed 5% and 1% for the above table), due
- to the distribution of values being discrete (d20 generates only integers,
- not anything between).
-
- So, for our above example, assume there are no larger differences
- than 9, and that we made 400 rolls on a d20 (hence the expected number
- in each cell is 20, as above). Then D is 9/400 = .0225, which if you
- can get tables you'd look up. We made 400 rolls, so we could use the
- table above: the square root of 400 is 20, so D x 20 (= 9/20) = .45.
- This is much less than the 5% value, so there is little evidence that
- the die is unfair.
-
- There are tests which are probably even more appropriate, but these
- two (chi-squared and K-S) will be enough for you to get a good idea
- of any suspect dice.
-
- Note: If you suspect a die, and decide to test it, don't use the
- rolls that made you suspect it in the test. Generate a new
- set. e.g. if you are all recording your rolls as you play,
- and one players' results look funny, don't then test those
- recorded values - you have to generate a new set.
-
- -------------------------------------------------------------------------
-
- Testing a die based on the histogram of rolls:
-
- The histogram approach can be turned into a test of sorts as follows:
- After drawing the histogram, 2 lines can be drawn either side of the
- expected (mean) result. If all histogram bars lie within the inner
- lines, there is no strong evidence of bias. If any of the bars go outside
- the outer lines, there is fairly clear evidence of bias. If one of the
- bars lies between the inner and outer lines, then there is some (mild)
- evidence of bias, but its is not really clear. You may wish to then
- perform a further test on the probability for that individual side,
- as described in the next section (Testing an individual face). If
- several of the bars lie between the inner and outer lines, we have a
- stronger indication of bias.
-
- Where do we draw the lines?
-
- I have worked out values for rolling 20 times for each face (as in the
- other examples, 80 times for a d4, 400 times for a d20). The bars on the
- histogram must actually go past these values. You could think of these
- values as giving "Acceptable Ranges" (literally, 95% and 99% acceptance
- regions) for the histogram. A fair die will give histograms with one or
- more bars outside these ranges 5% and 1% of the time respectively
- (actually just under).
-
-
- Table for 20
- throws per face Approximate formula:
-
- d# 5% 1% Let N be the total number of rolls.
- 4 8-34 6-37 Let c be the number of faces on the die.
- 6 9-33 7-36 Let e be the expected number of times
- 8 9-33 8-35 each face will come up ( e = N/c ).
- 10 10-32 8-35 Then the lines go at e +/- A x sqrt[e (c-1)/c],
- 20 11-30 9-32 and the value for 'A' comes from the table below.
- All fractions should be rounded up.
- e.g. N=160, c=8, e=20 give:
- 5%: 20 +/- 2.73 sqrt (20 x 7/8) or 8-32
- 1%: 20 +/- 3.22 sqrt (20 x 7/8) or 7-34
-
- Table to go with
- approximate formula:
-
- d# 5% 1%
- 4 2.49 3.02 Due to being in the
- 6 2.63 3.14 extreme tails of the
- 8 2.73 3.22 distribution, combined
- 10 2.80 3.29 with slight asymmetry,
- 12 2.86 3.34 the ranges we get are
- 20 3.02 3.48 sometimes out a bit.
- This is not a big deal.
-
-
- Example: We throw a d20 400 times, and record the results
- and from the table above, we draw the inner lines at 11 & 30,
- and the outer lines at 9 & 32, as well as a reference line at 20.
- (in the histogram below, "." =1 count, ":" =2 counts. The horizontal
- lines aren't quite in the correct positions; they ought to be about
- a quarter to half a character position lower.)
-
- Counts (34)
- 35 + > <
- |__________________________:__________________________________ (32)
- |__________________________:__________________________________ (30)
- | :
- | :
- 25 + : .
- | . : : :
- |__:__._____:_____:________:____________________:__:__:_______ (20)
- | : : : . : : : . : : :
- | : : : : : : : : . : : : : : : :
- 15 + : : : : : : : : : : : . : . : : : : :
- | : : : : : : : : : : : : : : : : : : : :
- |--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:- (11)
- |--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:--:- ( 9)
- | : : : : : : : : : : : : : : : : : : : :
- 5 + : : : : : : : : : : : : : : : : : : : :
- | : : : : : : : : : : : : : : : : : : : :
- | : : : : : : : : : : : : : : : : : : : :
- `--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
- face: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-
- counts: 22 21 18 23 19 22 20 16 34 17 19 15 18 15 14 22 25 24 18 18
- ^^
-
- One value (34) goes outside the 1% values (technically, you could say
- goes outside a 99% acceptance region), so it seems our dice is biased.
- More particularly, it rolls too many 9's. Whether this will affect a
- game very much is another point. (If it had been "1" or "20", however,
- perhaps this would result in a large effect on the game).
-
-
-
- Testing an individual face:
-
- When you suspect a particular face is coming up with the wrong
- frequency, or only wish to test a particular face (e.g. 20 on a
- d20), throw as before, but you can compare with a narrower range,
- as below. However, you can't use the data that made you suspect
- the face in this test, you must generate a new set of rolls.
- E.g. If you a histogram and find the results for 7 look odd,
- you can't use those numbers in this test. In that case the ranges
- given above (for testing an entire histogram) are appropriate.
-
- For the tables below, as for those above, you need to exceed the
- range given to call the die 'probably biased'.
-
- When you don't know the direction already (two-tailed test):
- (If you are in doubt, use this table rather than the next one)
-
- Table for 20
- throws per face Approximate formula (reasonably accurate):
-
- d# 5% 1% Let N be the total number of rolls.
- 4 13-28 11-30 Let c be the number of faces on the die.
- 6 12-28 10-31 Let e be the expected number of times
- 8 12-29 10-31 each face will come up ( e = N/c ).
- 10 12-29 10-32 Then the lines go at e +/- 1.96 x sqrt[e (c-1)/c],
- 12 12-29 10-32 (5%) and for 1% at e +/- 2.58 x sqrt[e (c-1)/c].
- 20 12-29 10-32 All fractions should be rounded up.
- e.g. N=160, c=8, e=20 give:
- 5%: 20 +/- 1.96 sqrt(20 x 7/8) or 12-29 (rounded up)
- 1%: 20 +/- 2.58 sqrt(20 x 7/8) or 10-31 (rounded up)
-
-
-
- When you think a particular face is coming up too often, or if you
- think a particular face isn't coming up enough (one-tailed test):
-
- Table for 20
- throws per face Approximate formula (reasonably accurate):
-
- d# 5% 1% Let N be the total number of rolls.
- 4 14/26 11/30 Let c be the number of faces on the die.
- 6 13/27 11/30 Let e be the expected number of times
- 8 13/27 11/30 each face will come up ( e = N/c ).
- 10 13/27 11/30 Then the lines go at e +/- 1.65 x sqrt[e (c-1)/c],
- 12 13/27 11/30 (5%) and for 1% at e +/- 2.33 x sqrt[e (c-1)/c].
- 20 13/27 11/31 All fractions should be rounded up.
- e.g. N=160, c=8, e=20 give:
- 5%: 20 +/- 1.65 sqrt(20 x 7/8) or 14-27 (rounded up)
- 1%: 20 +/- 2.33 sqrt(20 x 7/8) or 11-30 (rounded up)
-
- These values are given as either/or i.e. since you have already specified
- a particular direction, you will compare with only the higher values or the
- lower values, not both.
-
-
- Example 1: You decide to test you new d20 to see if it rolls the
- correct number of 20's, but you don't believe it to
- be biased in a particular direction. You roll 400
- times, and 35 times you get a "20". From the "two-tailed"
- table above, you can see that's outside the outer (1%) range.
- It seems your d20 rolls too many 20's.
-
- Example 2: Another player seems to be rolling a lot of 1's on her d4.
- You decide to test it whether 1 comes up too often.
- You roll it 80 times, and get the following:
-
- 1 2 3 4
- 13 26 25 16
-
- Since you decided to test if there were too many 1's, you
- can only see if the number of 1's exceeds 26, which it does
- not. You can't say, after generating the data "Oh, actually,
- perhaps it rolls too few ones", or "perhaps it rolls too many
- 2's" without generating a new set of data for the new hypothesis.
- You must never base what you are testing for on what you spy
- in the set of data you use in the test. Our only conclusion
- on this test: the d4 doesn't roll too many 1's.
-
- You may like to then generate a new set of rolls to see if it
- rolls too few 1's.
-
- ---------------------------------------------------------------------
- As further examples, here are the chi-squared test and Kolmogorov-Smirnov (KS)
- tests performed on the same data.
-
- chi squared test:
-
- counts: 22 21 18 23 19 22 20 16 34 17 19 15 18 15 14 22 25 24 18 18
- diff
- from 20: 2 1 2 3 1 2 0 4 14 3 1 5 2 5 6 2 5 6 2 2
- diff^2: 4 1 4 9 1 4 0 16 196 9 1 25 4 25 36 4 25 36 4 4
- sum diff^2: 408
- chi-squared statistic: 408/20 = 20.4
- (far less than the 5% value of 30.14)
-
-
- Kolmogorov-Smirnov test:
-
- (Calculations have been run down the page because I can't fit 20 3 digit
- numbers, with spaces and labels across an 80-column screen).
-
- counts sum expected diff
- 22 22 20 2
- 21 43 40 3
- 18 61 60 1
- 23 84 80 4
- 19 103 100 3
- 22 125 120 5
- 20 145 140 5
- 16 161 160 1
- 34 195 180 15 <-- max diff, D, is 15. Well short of significance
- 17 212 200 12 at the 5% level. e.g. calc D/sqrt(n) = 15/20
- 19 231 220 11 or .75; where the 5% value from the simulations
- 15 246 240 6 is 1.14
- 18 264 260 4
- 15 279 280 1
- 14 293 300 7
- 22 315 320 5
- 25 340 340 0
- 24 364 360 4
- 18 382 380 2
- 18 400 400 0
-
- ----------------------------------------------------------------------------
-
-
- Conover, W.J. (1980): Practical nonparametric statistics,
- 2nd Ed., Wiley, New York.
-
- Neave, H.R. and Worthington, P.L.B. (1988): Distribution-free tests,
- Unwin Hyman, London.
-
-