NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / sci / math / stat / 1711 < prev next >

Wrap

Text File | 1992-08-20 | 7.0 KB | 155 lines

Newsgroups: sci.math.stat Path: sparky!uunet!wupost!sdd.hp.com!mips!news.cs.indiana.edu!umn.edu!thompson From: thompson@atlas.socsci.umn.edu (T. Scott Thompson) Subject: Re: Standard Deviation. Message-ID: <thompson.714333068@kiyotaki.econ.umn.edu> Sender: news@news2.cis.umn.edu (Usenet News Administration) Nntp-Posting-Host: kiyotaki.econ.umn.edu Reply-To: thompson@atlas.socsci.umn.edu Organization: University of Minnesota References: <1992Aug14.172833.11844@cbfsb.cb.att.com> <c48nbgtf@csv.warwick.ac.uk> <WVENABLE.92Aug18180002@algona.stats.adelaide.edu.au> <1992Aug18.214711.6657@mailhost.ocs.mq.edu.au> Date: Thu, 20 Aug 1992 17:51:08 GMT Lines: 141 wskelly@laurel.ocs.mq.edu.au (William Skelly) writes: >Heeding my previous comment, what the hell is an "orthogonal projection of >the observation vector onto the residual space?" I thought the Sum of --------------------------------------------------------------------------- Note: (1) A brief lesson on the geometry of a sample mean; (2) a warning about interpreting the difference between means; and (3) some personal opinions of a philosophical nature all follow. Professional statisticians probably want to hit "n" now! --------------------------------------------------------------------------- Here's how it goes in the simple case of a sample mean. Think of your data as a vector in n-dimensional space. Each observation corresponds to one of the coordinates. Call this vector X. Next, think about the line in this n-dimensional space defined by requiring all components to be equal. For example, a typical point on this line would be (m,m,m,...,m,m,m). Next consider the set of all vectors in the space that are perpendicular to the line. It is easy to see that this subspace consists of all vectors whose coordinates sum to zero. This is the residual space. Why it has this name should be clear in a moment. Call this space E. It is a linear subspace of dimension n-1. Now since any vector in E is perpendicular (orthogonal) to all vectors of the form (m,m,m,...,m,m,m), your data vector has a _unique_ representation of the form X = (m,m,m,...,m,m,m) + e. That is, there is a unique value of m and a unique vector e in E that satisfy this equation. (Notice that because we have placed no restrictions on X this same statement is true for _any_ vector in the n-dimensional space.) It is easy to work out that the value of m that does the trick is m = <sample mean>, and the vector e that does the trick is the vector of residuals obtained by subtracting <sample mean> from each element of X. Furthermore, it is easy to calculate that among all points in E, this residual vector is the one that is closest to X. (This is why we call it the projection of X onto E). >the observation vector onto the residual space?" I thought the Sum of >Squares was just that, x_1^2 + x_2^2 + ....? The length of the residual vector is (the square root of) (x_1 - <sample mean>)^2 + ... + (x_n - <sample mean>)^2 which is the sum of squared _residuals_. It is this sum of squares to which the original comment applied. Here is a very crude representation of the above in the case where n = 2. Note that all angles should be right angles but probably won't be on your terminal display! E (should be orthogonal to the line) | | / | / <--- line consisting of all points of the \ v / form (m,m,m,...,m,m,m) \ / \ ./ <--- point all of whose coordinates equal the \ / \ sample mean. \ / \ Origin-> \./ \ /^\ \ / \ \X / \ / / \ / / \ / / e\ \ If n = 3 then E becomes a plane in three dimensional space; if n = 4 then E becomes a three dimensional subset of four-dimensional space; etc. >I am not sure I follow you. From my applications I only want to test >some null-hypothesis (perhaps a narrow application...but very useful!). >Generally I want to know if two samples are from the same population. >Isn't this just asking whether or not the two sample means are close? No! A population is a distribution of values. Two populations can be very different yet have the same mean. Conversely, two populations can have almost identical distributions (in the sense of having similar histograms or probability density functions, for example) and yet have means that are arbitrarily far apart. Asking whether or not two sample means are close is _not_ the same as asking if the two samples were drawn from the same population. >Is "estimation" part of inferential or descriptive statistical anlysis >(serious question)? My first reaction is to say "yes" for the inferential part and "no" for the descriptive part. These reactions are based (respectively) on the obvious points that (i) inference is often based on parameter estimates, and (ii) you must have something to estimate before you can talk about estimation. On further reflection, however, I don't think that the answers are so clear. Perhaps "sometimes" and "sometimes" would be better. One of my professors once said something like: "Every statistic is a good estimator of something." By which, he of course meant that we can always _define_ our object of investigation / inference to be whatever feature of a sampling distribution that statistic happens to inform us about. In this sense "estimation" is a legitimate part of descriptive statistical analysis, simply because we can always broaden the definition of "parameter" so that _all_ statistics are estimators. On the other hand, a set of statistics that are not sufficient may be perfectly legitimate estimators of some parameters, yet not be terribly useful for certain inferential problems (for example some Bayesian decision problems). Thus a set of statistics may permit estimates of all model parameters to be calculated, and nevertheless be inadequate for solving other inferential problems. >although I'm always looking for a better book. The problem is that any >paper you read states that there is some assumption of "biasedness/ >unbiasedness" in the methods used. Therefore, it is important to know >and understand what these terms mean, or are you implying that such >assumptions need not be stated because they are unimportant? The point is that unbiasedness is not a very useful property and bias is not necessarily bad. One can have unbiased estimates that are not very informative because they have so much variability, and one can have biased estimates that are very good because the variability and bias are both small. One cannot decide whether or not an estimator is any good simply by looking at it's bias. "All other things equal" I suppose we would all like unbiased estimators. Unfortunately, all other things are rarely equal. T. Scott Thompson thompson@atlas.socsci.umn.edu Department of Economics University of Minnesota