NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / sci / math / stat / 1690 < prev next >

Wrap

Internet Message Format | 1992-08-17 | 2.9 KB

Path: sparky!uunet!munnari.oz.au!yoyo.aarnet.edu.au!sirius.ucs.adelaide.edu.au!sirius!wvenable From: wvenable@algona.stats.adelaide.edu.au (Bill Venables) Newsgroups: sci.math.stat Subject: Re: Standard Deviation. Message-ID: <WVENABLE.92Aug18180002@algona.stats.adelaide.edu.au> Date: 18 Aug 92 08:30:02 GMT References: <1992Aug14.172833.11844@cbfsb.cb.att.com> <c48nbgtf@csv.warwick.ac.uk> Sender: news@ucs.adelaide.edu.au Organization: Department of Statistics, University of Adelaide Lines: 37 Nntp-Posting-Host: algona.stats.adelaide.edu.au In-reply-to: psrdj@warwick.ac.uk's message of 17 Aug 92 09:35:38 GMT >>>>> "Glyn" == G M Collis <psrdj@warwick.ac.uk> writes: Glyn> What intrigues me is that the most elementary stats texts make a big Glyn> fuss about using n-1 for an unbiased estimate of the variance, but Glyn> ignore the fact that this gives a biased estimate for the SD. I Glyn> recall that n - 1.5 is nearer the target for the SD when the sample Glyn> is from a normally distributed population. I gather that minimising Glyn> the bias when estimating the SD is rather sensitive to the population Glyn> distribution - I'd like to know more about this. But my big puzzle Glyn> remains - why is the biasedness of the usual SD estimator (with N-1) Glyn> so rarely mentioned, in stark contrast to the case of the variance. What surprises me is how this quaint little thread got going at all. The elementary books are wrong if they make a big issue of unbiasedness, period. In this context the *two* important quantities are (a) the sum of squares, since it is the squared length of the orthogonal projection of the observation vector onto the residual space, and (b) the degrees of freedom, which is the dimension of the residual space. This latter number is sometimes n-1, but more often n-p where p is somewhat larger than 1. These two quantities, *separately*, are what you need for virtually all inferential procedures, like testing and confidence intervals. Whether you divide one by the other to give an estimate of the variance is up to you. Incidently, if you do, it turns out to be unbiased, but "so what?", really. In my opinion statistical inference is all about reliably capturing information from data (and elsewhere if you are a Bayesian); it's not really about coming up with a number from a data set that you can show will be "close" to an unknown parameter value, in some special sense of "close". The trouble with many elementary books is that they get hung up on a narrow definition of "estimation" and elevate unbiasedness to an importance far in excess of what is warranted, at the same time not mentioning sufficiency, say, a far more important concept, (but harder to describe, of course). -- ___________________________________________________________________________ Bill Venables, Dept. of Statistics, | Email: venables@stats.adelaide.edu.au Univ. of Adelaide, South Australia. | Tel: +61 8 228 5412 Fax: ...232 5670