NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / sci / math / stat / 1679 < prev next >

Wrap

Internet Message Format | 1992-08-17 | 3.3 KB

Path: sparky!uunet!dtix!darwin.sura.net!ukma!rutgers!news.cs.indiana.edu!umn.edu!thompson From: thompson@atlas.socsci.umn.edu (T. Scott Thompson) Newsgroups: sci.math.stat Subject: Re: Fwd: Standard Deviation. Message-ID: <thompson.714071949@daphne.socsci.umn.edu> Date: 17 Aug 92 17:19:09 GMT References: <1992Aug14.172833.11844@cbfsb.cb.att.com> <seX2yRq00Uh785H2EB@andre <1992Aug14.231916.23479@magnus.acs.ohio-state.edu> <1992Aug16.212245.27577@mailhost.ocs.mq.edu.au> <1992Aug16.225926.497@massey.ac.nz> Sender: news@news2.cis.umn.edu (Usenet News Administration) Reply-To: thompson@atlas.socsci.umn.edu Organization: University of Minnesota Lines: 61 Nntp-Posting-Host: daphne.socsci.umn.edu news@massey.ac.nz (USENET News System) writes: >In article <1992Aug16.212245.27577@mailhost.ocs.mq.edu.au>, wskelly@laurel.ocs.mq.edu.au (William Skelly) writes: >> >> This and other posting indicate that there is a relationship between >> sample size and and estimated variance (of the population) which is >> positive and always an underestimate. What is the limit, or point >> at which an increasing sample size no longer improve the estimate >> of populations variance? >> >When the sample size is equal to the population size (never for an >infinite population). Not necessarily true. It depends on the sample design. Suppose that we have a finite population of size N and we draw _with_replacement_ a sample of size n using independent draws. Then the bias from using the "n" denominator is (minus) <population variance> / n and this is true for _any_ n, including n = N, or even n = 2N. >> Can this be tested by taking samples of sample >Yes but we can work out the theory so it isn't necessary. >> (the later sample >> being elevated to the status of population)? >> Perhaps we can work out the theory for the bias, but this is not so clear when we consider other features of the distribution of the estimate. See the related questions about the bias of the usual SD estimate in this thread for an example. The intuition in the original comment was right on! In fact the procedure of taking samples from the original sample, "elevating" the original sample to the status of population is _exactly_ the definition of (a particular variety of) bootstrap resampling. See for example the book _The_Bootstrap_and_Edgeworth_ _Expansion_, by Peter Hall (Springer-Verlag, 1992), which provides many references and examples. It is shown in the bootstrap literature that for certain purposes (e.g. constructing statistical tests when the sample is drawn from a non-normal population) the bootstrap procedure can outperform (sometimes significantly) the traditional procedures. So, for example, if the reason you are estimating a variance is to put its square root in the denomenator of a t-statistic, then boostrapping _may_ be the way to go (unless you really believe in the normality of your population). For this application, you really aren't interested in whether or not your estimate of the variance is biased, and so the n vs. n-1 debate is irrelevant. In fact, the bootstrap based test procedure will lead to the same test whether you choose n _or_ n-1, an invariance property that I find appealing. T. Scott Thompson thompson@atlas.socsci.umn.edu Dept. of Economics University of Minnesota