home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: sci.math.stat
- Path: sparky!uunet!wupost!sdd.hp.com!caen!destroyer!ais.org!umeecs!umn.edu!thompson
- From: thompson@atlas.socsci.umn.edu (T. Scott Thompson)
- Subject: Re: Standard Deviation.
- Message-ID: <thompson.714070323@daphne.socsci.umn.edu>
- Keywords: (n) versus (n-1)
- Sender: news@news2.cis.umn.edu (Usenet News Administration)
- Nntp-Posting-Host: daphne.socsci.umn.edu
- Reply-To: thompson@atlas.socsci.umn.edu
- Organization: University of Minnesota
- References: <1992Aug14.172833.11844@cbfsb.cb.att.com> <c48nbgtf@csv.warwick.ac.uk>
- Date: Mon, 17 Aug 1992 16:52:03 GMT
- Lines: 59
-
- psrdj@warwick.ac.uk (G M Collis) writes:
-
- >What intrigues me is that the most elementary stats texts make a big
- >fuss about using n-1 for an unbiased estimate of the variance, but ignore
- >the fact that this gives a biased estimate for the SD. I recall
- >that n - 1.5 is nearer the target for the SD when the sample is
- >from a normally distributed population. I gather that minimising
- >the bias when estimating the SD is rather sensitive to the population
- >distribution - I'd like to know more about this. But my big puzzle
- >remains - why is the biasedness of the usual SD estimator (with N-1)
- ^^^^^ ^^ ^^^^^^^^^ ^^^^^^^^
- >so rarely mentioned, in stark contrast to the case of the variance.
-
- The lack of mention probably arises because general results are
- unavailable due to the dependence on distributional shape that you
- mention. By contrast, distribution-free results for the sample
- variance are well known.
-
- I assume that you intend
-
- "usual" SD estimate == sqrt("usual" variance estimate)
-
- This is biased because (1) the usual (unbiased) variance estimate
- is itself random and (2) the sqrt( ) function is nonlinear. Generally
- for any random variable x, if Var[x] > 0 then
-
- E[ sqrt(x) ] < sqrt( E[x] )
-
- (due to the concavity of the sqrt( ) function). Plugging in
-
- x == "usual" variance estimate
-
- so that E[x] = population variance, and using
-
- population SD == sqrt( population variance )
-
- we get
-
- E[ "usual" SD estimate ] < population SD.
-
- That is, the usual estimate is downward biased. The bias comes from
- the fact that the usual variance estimate has some variation around
- the true variance. This variation interacts with the curvature of the
- sqrt( ) function to give the bias. The amount of bias depends on
- exactly how the variation in the usual population variance estimator
- is distributed around the mean value. Hence the sensitivity to
- distributional assumptions.
-
- The factor n - 1.5 may produce better results for the normal
- distribution. I haven't checked. Keep in mind, that the bias
- disappears fairly quickly with increases in sample size. In fact,
- using n - <arbitrary constant> will work just fine for most purposes
- provided n is large enough. This is because
-
- (n-1)/(n-<arbitrary constant>) -> 1
-
- as n increases, and also because the variance in the "usual estimate"
- decreases with the sample size.
-
-