home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!mcsun!Germany.EU.net!math.fu-berlin.de!Sirius.dfn.de!darwin.sura.net!gatech!purdue!mentor.cc.purdue.edu!pop.stat.purdue.edu!hrubin
- From: hrubin@pop.stat.purdue.edu (Herman Rubin)
- Newsgroups: sci.math.stat
- Subject: Re: robust location estimators
- Message-ID: <Bvyytu.KCq@mentor.cc.purdue.edu>
- Date: 11 Oct 92 18:17:53 GMT
- References: <BvGFAu.GEv@mentor.cc.purdue.edu> <1992Oct2.084819.11178@waikato.ac.nz> <1992Oct11.121834.11351@waikato.ac.nz>
- Sender: news@mentor.cc.purdue.edu (USENET News)
- Organization: Purdue University Statistics Department
- Lines: 114
-
- In article <1992Oct11.121834.11351@waikato.ac.nz> maj@waikato.ac.nz writes:
- >In article <1992Oct2.084819.11178@waikato.ac.nz>, maj@waikato.ac.nz writes:
- >> In article <BvGFAu.GEv@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes:
-
-
- >> [stuff and quotes deleted]
-
- >>> Robustness cannot be defined in a formal manner to be a precise concept.
- >>> The definition I like is
-
- >>> The robustness of a procedure is the extent to which its
- >>> properties do not depend on those assumptions which one
- >>> does not wish to make.
-
- >> This is a good definition of what one wants, but I dispute that
- >> robustness cannot be given a formal definition. I'll try:
-
- >> A statistical functional is robust iff it is continuous.
-
- >> Continuous in what topology? you may ask. *That* I wont answer,
- >> that's were the vagueness of the concept comes in.
-
-
- >An email correspondence with Peter Hamer leads me to think that I was perhaps a
- >little on the terse side in my earlier posting, he writes:
-
- > > Take the distance between two cdfs F and G to
- > > be given by, say, the sup norm.
-
- > > It is easy to see that within any epsilon of
- > > F there are distributions with arbitrarily
- > > different mean.
-
- > I don't know how you intended this to be interpreted. I seems
- > very much like saying that applied statistics is impossible;
- > as no real data is *known* to come from a precisely specified
- > distribution.
-
- > All applied statistical procedures must be applicable to distributions
- > `close to' the one nominally assumed, and you seem to be saying that
- > this is impossible.
-
- >[My reply may be of interest to other readers of this group.]
-
- >Rather than try to persuade you about the correctness of my
- >remarks I have looked up a few references. Have a look at
-
- >Staudte & Sheather 'Robust Estimation and Testing' Wiley 1990
- >Section 3.2.4 pp65-67
-
- >Hampel, F.R. 'A General Qualitative definition of Robustness'
- >Ann Math Stat v42, 1887-1896, (1971)
- >[esp. Theorem 1, p1891]
-
- >Huber, P.J. 'Robust Statistical Procedures'(1977) #27 in
- >CBMS-NSF series.
- >[esp. Chapter 2 and first part of Ch 3. I prefer this to
- >the more elaborate treatment in his 1981 book.]
-
- >Huber and Hampel work in the full generality with statistics
- >understood as sequences of functionals. However the main
- >points are unchanged and more easily understood by considering
- >only functionals.
-
- >Example: the divisor n-1 standard deviation can be represented
- > as a sequence of functionals, one for each sample
- > size.
- > The divisor n standard deviation can be represented
- > as a single functional with no need to involve
- > sample size.
-
- >The discontinuity of the sample mean as a function from a
- >space of cdfs to the reals poses no real problem to
- >*applied* statistics because in practice we do not really use
- >the mean by itself: we actually employ data inspection
- >followed by transformations and/or outlier deletion. In reality
- >the robust perspective is more of a threat to traditional
- >*mathematical* statistics with its highly developed theory
- >about the behaviour of relatively simple statistics at
- >precisely specified models, something that does not really
- >model modern applied statistical practice in the least.
-
- This attempt to avoid the problem does not succeed. Any crass attempt
- to delete outliers messes up things but good; in most situations, I am
- much more inclined to go along with the long tails than with the total
- damage to the analysis which these produce, especially if there are
- explanatory variables. As for transformations, unless one is looking
- only at the distribution of a single random variable, they should NEVER
- be used, as they are almost certain to destroy the model. The observation
- that the model is not exactly correct is quite appropriate, but while
- it is possible to show that robustness in the sense I have given, which
- is a slight extension of the original definition of Box, is quite possible,
- robustness in the Huber-Hampel sense is so rare for reasonable models as
- to be a fatuity.
-
- The laws of large numbers are useful robustness theorems. The Central
- Limit Theorem is a robustness theorem. For regression, the Gauss-Markov
- Theorem is a robustness theorem of the important type of practical
- situations not covered by the use of continuity. Any time that it
- is shown that normality is not needed, but that a few moments suffice,
- one has a robustness result which is not covered by the Huber-Hempel
- definition.
-
- The well-known asymptotic properties of maximum likelihood and Bayes
- estimates are among this class. The mean may be the parameter of interest;
- there is no Huber-Hampel robust estimator of it.
-
- BTW, most simulations assume symmetry. This is a far stronger assumption
- than merely having a few moments.
- --
- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
- Phone: (317)494-6054
- hrubin@pop.stat.purdue.edu (Internet, bitnet)
- {purdue,pur-ee}!pop.stat!hrubin(UUCP)
-