NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / sci / math / stat / 2644 < prev next >

Wrap

Internet Message Format | 1992-12-21 | 2.9 KB

Path: sparky!uunet!zaphod.mps.ohio-state.edu!usc!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!spool.mu.edu!olivea!mintaka.lcs.mit.edu!mintaka!cthombor From: cthombor@theory.lcs.mit.edu (Clark D. Thomborson) Newsgroups: sci.math.stat Subject: Re: Robust chi-squared routine? Message-ID: <CTHOMBOR.92Dec18122658@tern.lcs.mit.edu> Date: 18 Dec 92 17:26:58 GMT References: <1g8evvINNqc0@network.ucsd.edu> Sender: news@mintaka.lcs.mit.edu Organization: MIT Lab for Computer Science Lines: 52 In-Reply-To: mbk@gibbs.ucsd.edu's message of 10 Dec 92 22:05:51 GMT From: mbk@gibbs.ucsd.edu (Matt Kennel) Is a chisquared test appropriate for a situation where the number of bins is very large, but the expected value per bin is quite a bit smaller than 1? Funny you should ask. I have an as-yet-unpublished manuscript on this very subject (which arose for me in the context of testing the output of various pseudorandom number generators), available by anonymous ftp from theory.lcs.mit.edu, directory pub/cthombor/Mrandom. The "short answer" to your question is that the Pearson statistic is useful for testing goodness-of-fit to a symmetric multinomial if n > 3\sqrt{k}, but you can't use the standard chi-squared tables safely in this range. The "discretization errors" are large, even near the mean, unless (by the rule of thumb given in most textbooks) n > 5k and (as is not disclosed in any textbook presentation, to my knowledge) k > 5 On the extreme upper tail, you're in trouble for any n and k. The non-symmetric case looks hopelessly complicated for any approach other than "enumeration" of the relevant terms in the multinomial. Furthermore, if there are large variations in cell probabilities, then the Pearson statistic is pretty awful. As long as you're enumerating, you might as well just calculate the exact tail probability.... I still haven't figured out where to publish this stuff. It was rejected by the SODA conference. Now I'm thinking about Interface '93. However, this conference seems to be dominated by applied statisticians who want to learn how to use computers more effectively. I don't think any mathematical statisticians are likely to attend, but then again, I've not yet found any mathematical statistician who is willing to think about "fixing up" that horrid old Pearson test. The usual response is that I should think about the likelihood ratio statistic if I (shudder) really am sure I want to do a hypothesis test. In response, I've been digging into the Bayesian-frequentist controversy, and I think I have a novel "compromise" between these styles of reasoning. If you're interested in getting a draft of this compromise paper, please send email. I'd appreciate any suggestions as to 1. where next to submit my Pearson "fixup" result 2. where to submit a paper on a Bayesian-frequentist compromise To save net bandwidth, please respond by email. Clark -- Clark