NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / sci / math / stat / 1688 < prev next >

Wrap

Internet Message Format | 1992-08-17 | 3.5 KB

Path: sparky!uunet!dtix!mimsy!stewart From: stewart@cs.umd.edu (G. W. Stewart) Newsgroups: sci.math.stat Subject: Re: Standard Deviation. Keywords: (n) versus (n-1) Message-ID: <59743@mimsy.umd.edu> Date: 18 Aug 92 05:18:50 GMT References: <1992Aug14.172833.11844@cbfsb.cb.att.com> <1992Aug16.211142.27499@mailhost.ocs.mq.edu.au> Sender: news@mimsy.umd.edu Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 53 In article <1992Aug16.211142.27499@mailhost.ocs.mq.edu.au> wskelly@laurel.ocs.mq.edu.au (William Skelly) writes: >In article <1992Aug14.172833.11844@cbfsb.cb.att.com> rizzo@cbnewsf.cb.att.com (anthony.r.rizzo) writes: >>Can someone explain why calculating the Standard Deviation (SD), >>for small samples, with (n-1) in the denominator is better than >>doing so with (n) in the denominator? I'm sure that there's >>a perfectly good reason for doing so. But we, lowly engineers >>aren't usually told the reason. Since the n-1 problem has generated considerable interest, it it perhaps worth a historical note. The problem is a special case of the degrees of freedom in the residual sum of squares in a linear regression problem (for the special case take the regression matrix to be a vector of all ones). The correct expression--$n-p$, where $p$ is the number of parameters and $n$ is the number of observations--is due to Gauss, who more than any one person was responsible for the introduction of the variance as a measure of dispersion. His definitive treatment of the subject is contained in a series of three memoirs, beginning with "Theoria Combinationis Observationum Erroribus Minimis Obnoxiae, Pars Prior" which appeared in 1821 and can be found in Volume 4 of his works. Here is what he says about the problem in the "Pars Posterior" (1823). In Articles~15 and~16 we gave a method for approximating the precision of observations\symbolnote{1}{)An inquiry into the same problem, which I published in an earlier memoir ({\it Bestimmung der Genauigkeit der Beobachtungen.\ Zeitschrift f\"ur Astronomie und vervandte Wissenschaften\/} Vol.~I, p.~185), was based on the same hypothesis about the probability function of the experimental error that I used to construct the method of least squares in the theory of the motion of heavenly bodies.}). But this method presupposes that a sufficient number of the errors themselves are known exactly, a condition which seldom, if ever, holds in practice. However, if some observeded quantities depend on one or more unknowns according to a known law, we may find the most reliable values of the unknowns by the method of least squares. If the values of the observed quantities are then computed from them, they will be felt to differ very little from the true values, so that the greater the number of observations the more surely we may take the differences as the true observation errors. This procedure is used in actual problems by all calculators who try to estimate precision a posteriori. But it is theoretically unsound; and although it is good enough for practical purposes in many cases, in others it can fail spectacularly. Therefore, it is well worth while to examine the problem more closely. Gauss goes on to point out that the mean of the residual sum of squares is by definition less than or equal to the mean of the sum of squares of the errors and hence must underestimate the latter, i.e., the variance of the error. He then derives the divisor $n-p$ mentioned above. Pete Stewart