NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / sci / math / stat / 1678 < prev next >

Wrap

Text File | 1992-08-17 | 2.9 KB | 74 lines

Newsgroups: sci.math.stat Path: sparky!uunet!wupost!sdd.hp.com!caen!destroyer!ais.org!umeecs!umn.edu!thompson From: thompson@atlas.socsci.umn.edu (T. Scott Thompson) Subject: Re: Standard Deviation. Message-ID: <thompson.714070323@daphne.socsci.umn.edu> Keywords: (n) versus (n-1) Sender: news@news2.cis.umn.edu (Usenet News Administration) Nntp-Posting-Host: daphne.socsci.umn.edu Reply-To: thompson@atlas.socsci.umn.edu Organization: University of Minnesota References: <1992Aug14.172833.11844@cbfsb.cb.att.com> <c48nbgtf@csv.warwick.ac.uk> Date: Mon, 17 Aug 1992 16:52:03 GMT Lines: 59 psrdj@warwick.ac.uk (G M Collis) writes: >What intrigues me is that the most elementary stats texts make a big >fuss about using n-1 for an unbiased estimate of the variance, but ignore >the fact that this gives a biased estimate for the SD. I recall >that n - 1.5 is nearer the target for the SD when the sample is >from a normally distributed population. I gather that minimising >the bias when estimating the SD is rather sensitive to the population >distribution - I'd like to know more about this. But my big puzzle >remains - why is the biasedness of the usual SD estimator (with N-1) ^^^^^ ^^ ^^^^^^^^^ ^^^^^^^^ >so rarely mentioned, in stark contrast to the case of the variance. The lack of mention probably arises because general results are unavailable due to the dependence on distributional shape that you mention. By contrast, distribution-free results for the sample variance are well known. I assume that you intend "usual" SD estimate == sqrt("usual" variance estimate) This is biased because (1) the usual (unbiased) variance estimate is itself random and (2) the sqrt( ) function is nonlinear. Generally for any random variable x, if Var[x] > 0 then E[ sqrt(x) ] < sqrt( E[x] ) (due to the concavity of the sqrt( ) function). Plugging in x == "usual" variance estimate so that E[x] = population variance, and using population SD == sqrt( population variance ) we get E[ "usual" SD estimate ] < population SD. That is, the usual estimate is downward biased. The bias comes from the fact that the usual variance estimate has some variation around the true variance. This variation interacts with the curvature of the sqrt( ) function to give the bias. The amount of bias depends on exactly how the variation in the usual population variance estimator is distributed around the mean value. Hence the sensitivity to distributional assumptions. The factor n - 1.5 may produce better results for the normal distribution. I haven't checked. Keep in mind, that the bias disappears fairly quickly with increases in sample size. In fact, using n - <arbitrary constant> will work just fine for most purposes provided n is large enough. This is because (n-1)/(n-<arbitrary constant>) -> 1 as n increases, and also because the variance in the "usual estimate" decreases with the sample size.