home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!usc!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!spool.mu.edu!olivea!mintaka.lcs.mit.edu!mintaka!cthombor
- From: cthombor@theory.lcs.mit.edu (Clark D. Thomborson)
- Newsgroups: sci.math.stat
- Subject: Re: Robust chi-squared routine?
- Message-ID: <CTHOMBOR.92Dec18122658@tern.lcs.mit.edu>
- Date: 18 Dec 92 17:26:58 GMT
- References: <1g8evvINNqc0@network.ucsd.edu>
- Sender: news@mintaka.lcs.mit.edu
- Organization: MIT Lab for Computer Science
- Lines: 52
- In-Reply-To: mbk@gibbs.ucsd.edu's message of 10 Dec 92 22:05:51 GMT
-
-
- From: mbk@gibbs.ucsd.edu (Matt Kennel)
-
- Is a chisquared test appropriate for a situation where the
- number of bins is very large, but the expected value per bin is
- quite a bit smaller than 1?
-
- Funny you should ask. I have an as-yet-unpublished manuscript on this
- very subject (which arose for me in the context of testing the output
- of various pseudorandom number generators), available by anonymous ftp
- from theory.lcs.mit.edu, directory pub/cthombor/Mrandom.
-
- The "short answer" to your question is that the Pearson statistic is
- useful for testing goodness-of-fit to a symmetric multinomial if n >
- 3\sqrt{k}, but you can't use the standard chi-squared tables safely in
- this range. The "discretization errors" are large, even near the
- mean, unless (by the rule of thumb given in most textbooks)
- n > 5k
- and (as is not disclosed in any textbook presentation, to my knowledge)
- k > 5
- On the extreme upper tail, you're in trouble for any n and k. The
- non-symmetric case looks hopelessly complicated for any approach other
- than "enumeration" of the relevant terms in the multinomial.
- Furthermore, if there are large variations in cell probabilities, then
- the Pearson statistic is pretty awful. As long as you're enumerating,
- you might as well just calculate the exact tail probability....
-
- I still haven't figured out where to publish this stuff. It was
- rejected by the SODA conference. Now I'm thinking about Interface
- '93. However, this conference seems to be dominated by applied
- statisticians who want to learn how to use computers more effectively.
- I don't think any mathematical statisticians are likely to attend, but
- then again, I've not yet found any mathematical statistician who is
- willing to think about "fixing up" that horrid old Pearson test. The
- usual response is that I should think about the likelihood ratio
- statistic if I (shudder) really am sure I want to do a hypothesis
- test. In response, I've been digging into the Bayesian-frequentist
- controversy, and I think I have a novel "compromise" between these
- styles of reasoning. If you're interested in getting a draft of this
- compromise paper, please send email.
-
- I'd appreciate any suggestions as to
-
- 1. where next to submit my Pearson "fixup" result
- 2. where to submit a paper on a Bayesian-frequentist compromise
-
- To save net bandwidth, please respond by email.
-
- Clark
-
- --
- Clark
-