home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!dtix!mimsy!stewart
- From: stewart@cs.umd.edu (G. W. Stewart)
- Newsgroups: sci.math.stat
- Subject: Re: Standard Deviation.
- Keywords: (n) versus (n-1)
- Message-ID: <59743@mimsy.umd.edu>
- Date: 18 Aug 92 05:18:50 GMT
- References: <1992Aug14.172833.11844@cbfsb.cb.att.com> <1992Aug16.211142.27499@mailhost.ocs.mq.edu.au>
- Sender: news@mimsy.umd.edu
- Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
- Lines: 53
-
- In article <1992Aug16.211142.27499@mailhost.ocs.mq.edu.au> wskelly@laurel.ocs.mq.edu.au (William Skelly) writes:
- >In article <1992Aug14.172833.11844@cbfsb.cb.att.com> rizzo@cbnewsf.cb.att.com (anthony.r.rizzo) writes:
- >>Can someone explain why calculating the Standard Deviation (SD),
- >>for small samples, with (n-1) in the denominator is better than
- >>doing so with (n) in the denominator? I'm sure that there's
- >>a perfectly good reason for doing so. But we, lowly engineers
- >>aren't usually told the reason.
-
- Since the n-1 problem has generated considerable interest, it it
- perhaps worth a historical note. The problem is a special case of the
- degrees of freedom in the residual sum of squares in a linear
- regression problem (for the special case take the regression matrix to
- be a vector of all ones). The correct expression--$n-p$, where $p$ is
- the number of parameters and $n$ is the number of observations--is due
- to Gauss, who more than any one person was responsible for the
- introduction of the variance as a measure of dispersion. His
- definitive treatment of the subject is contained in a series of three
- memoirs, beginning with "Theoria Combinationis Observationum Erroribus
- Minimis Obnoxiae, Pars Prior" which appeared in 1821 and can be found
- in Volume 4 of his works. Here is what he says about the problem in
- the "Pars Posterior" (1823).
-
- In Articles~15 and~16 we gave a method for approximating the
- precision of observations\symbolnote{1}{)An inquiry into the same
- problem, which I published in an earlier memoir ({\it Bestimmung der
- Genauigkeit der Beobachtungen.\ Zeitschrift f\"ur Astronomie und
- vervandte Wissenschaften\/} Vol.~I, p.~185), was based on the same
- hypothesis about the probability function of the experimental error
- that I used to construct the method of least squares in the theory of
- the motion of heavenly bodies.}). But this method presupposes that a
- sufficient number of the errors themselves are known exactly, a
- condition which seldom, if ever, holds in practice. However, if some
- observeded quantities depend on one or more unknowns according to a
- known law, we may find the most reliable values of the unknowns by
- the method of least squares. If the values of the observed
- quantities are then computed from them, they will be felt to differ
- very little from the true values, so that the greater the number of
- observations the more surely we may take the differences as the true
- observation errors.
-
- This procedure is used in actual problems by all calculators who try
- to estimate precision a posteriori. But it is theoretically unsound;
- and although it is good enough for practical purposes in many cases,
- in others it can fail spectacularly. Therefore, it is well worth
- while to examine the problem more closely.
-
- Gauss goes on to point out that the mean of the residual sum of
- squares is by definition less than or equal to the mean of the sum of
- squares of the errors and hence must underestimate the latter, i.e.,
- the variance of the error. He then derives the divisor $n-p$
- mentioned above.
-
- Pete Stewart
-