home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: sci.math.stat
- Path: sparky!uunet!wupost!sdd.hp.com!mips!news.cs.indiana.edu!umn.edu!thompson
- From: thompson@atlas.socsci.umn.edu (T. Scott Thompson)
- Subject: Re: Standard Deviation.
- Message-ID: <thompson.714333068@kiyotaki.econ.umn.edu>
- Sender: news@news2.cis.umn.edu (Usenet News Administration)
- Nntp-Posting-Host: kiyotaki.econ.umn.edu
- Reply-To: thompson@atlas.socsci.umn.edu
- Organization: University of Minnesota
- References: <1992Aug14.172833.11844@cbfsb.cb.att.com> <c48nbgtf@csv.warwick.ac.uk> <WVENABLE.92Aug18180002@algona.stats.adelaide.edu.au> <1992Aug18.214711.6657@mailhost.ocs.mq.edu.au>
- Date: Thu, 20 Aug 1992 17:51:08 GMT
- Lines: 141
-
- wskelly@laurel.ocs.mq.edu.au (William Skelly) writes:
-
- >Heeding my previous comment, what the hell is an "orthogonal projection of
- >the observation vector onto the residual space?" I thought the Sum of
-
- ---------------------------------------------------------------------------
- Note: (1) A brief lesson on the geometry of a sample mean;
- (2) a warning about interpreting the difference between means;
- and (3) some personal opinions of a philosophical nature
-
- all follow. Professional statisticians probably want to hit "n" now!
- ---------------------------------------------------------------------------
-
- Here's how it goes in the simple case of a sample mean. Think of your
- data as a vector in n-dimensional space. Each observation corresponds
- to one of the coordinates. Call this vector X.
-
- Next, think about the line in this n-dimensional space defined by
- requiring all components to be equal. For example, a typical point on
- this line would be (m,m,m,...,m,m,m).
-
- Next consider the set of all vectors in the space that are
- perpendicular to the line. It is easy to see that this subspace
- consists of all vectors whose coordinates sum to zero. This is the
- residual space. Why it has this name should be clear in a moment.
- Call this space E. It is a linear subspace of dimension n-1.
-
- Now since any vector in E is perpendicular (orthogonal) to all vectors
- of the form (m,m,m,...,m,m,m), your data vector has a _unique_
- representation of the form
-
- X = (m,m,m,...,m,m,m) + e.
-
- That is, there is a unique value of m and a unique vector e in E that
- satisfy this equation. (Notice that because we have placed no
- restrictions on X this same statement is true for _any_ vector in the
- n-dimensional space.) It is easy to work out that the value of m that
- does the trick is m = <sample mean>, and the vector e that does the
- trick is the vector of residuals obtained by subtracting <sample mean>
- from each element of X.
-
- Furthermore, it is easy to calculate that among all points in E, this
- residual vector is the one that is closest to X. (This is why we call it
- the projection of X onto E).
-
- >the observation vector onto the residual space?" I thought the Sum of
- >Squares was just that, x_1^2 + x_2^2 + ....?
-
- The length of the residual vector is (the square root of)
-
- (x_1 - <sample mean>)^2 + ... + (x_n - <sample mean>)^2
-
- which is the sum of squared _residuals_. It is this sum of squares to
- which the original comment applied.
-
- Here is a very crude representation of the above in the case where n = 2.
- Note that all angles should be right angles but probably won't be on
- your terminal display!
-
- E (should be orthogonal to the line)
- |
- | /
- | / <--- line consisting of all points of the
- \ v / form (m,m,m,...,m,m,m)
- \ /
- \ ./ <--- point all of whose coordinates equal the
- \ / \ sample mean.
- \ / \
- Origin-> \./ \
- /^\ \
- / \ \X
- / \ /
- / \ /
- / \ /
- / e\
- \
-
- If n = 3 then E becomes a plane in three dimensional space; if n = 4
- then E becomes a three dimensional subset of four-dimensional space;
- etc.
-
- >I am not sure I follow you. From my applications I only want to test
- >some null-hypothesis (perhaps a narrow application...but very useful!).
- >Generally I want to know if two samples are from the same population.
- >Isn't this just asking whether or not the two sample means are close?
-
- No! A population is a distribution of values. Two populations can be
- very different yet have the same mean. Conversely, two populations
- can have almost identical distributions (in the sense of having
- similar histograms or probability density functions, for example) and
- yet have means that are arbitrarily far apart.
-
- Asking whether or not two sample means are close is _not_ the same as
- asking if the two samples were drawn from the same population.
-
- >Is "estimation" part of inferential or descriptive statistical anlysis
- >(serious question)?
-
- My first reaction is to say "yes" for the inferential part and "no"
- for the descriptive part. These reactions are based (respectively) on
- the obvious points that (i) inference is often based on parameter
- estimates, and (ii) you must have something to estimate before you can
- talk about estimation.
-
- On further reflection, however, I don't think that the answers are so
- clear. Perhaps "sometimes" and "sometimes" would be better.
-
- One of my professors once said something like: "Every statistic is a
- good estimator of something." By which, he of course meant that we
- can always _define_ our object of investigation / inference to be
- whatever feature of a sampling distribution that statistic happens to
- inform us about. In this sense "estimation" is a legitimate part of
- descriptive statistical analysis, simply because we can always broaden
- the definition of "parameter" so that _all_ statistics are estimators.
-
- On the other hand, a set of statistics that are not sufficient may be
- perfectly legitimate estimators of some parameters, yet not be
- terribly useful for certain inferential problems (for example some
- Bayesian decision problems). Thus a set of statistics may permit
- estimates of all model parameters to be calculated, and nevertheless
- be inadequate for solving other inferential problems.
-
- >although I'm always looking for a better book. The problem is that any
- >paper you read states that there is some assumption of "biasedness/
- >unbiasedness" in the methods used. Therefore, it is important to know
- >and understand what these terms mean, or are you implying that such
- >assumptions need not be stated because they are unimportant?
-
- The point is that unbiasedness is not a very useful property and bias
- is not necessarily bad. One can have unbiased estimates that are not
- very informative because they have so much variability, and one can
- have biased estimates that are very good because the variability and
- bias are both small. One cannot decide whether or not an estimator is
- any good simply by looking at it's bias.
-
- "All other things equal" I suppose we would all like unbiased
- estimators. Unfortunately, all other things are rarely equal.
-
- T. Scott Thompson thompson@atlas.socsci.umn.edu
- Department of Economics
- University of Minnesota
-