NetNews Usenet Archive 1992 #23

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #23 / NN_1992_23.iso / spool / sci / math / stat / 2107 < prev next >

Wrap

Internet Message Format | 1992-10-11 | 5.6 KB

Path: sparky!uunet!mcsun!Germany.EU.net!math.fu-berlin.de!Sirius.dfn.de!darwin.sura.net!gatech!purdue!mentor.cc.purdue.edu!pop.stat.purdue.edu!hrubin From: hrubin@pop.stat.purdue.edu (Herman Rubin) Newsgroups: sci.math.stat Subject: Re: robust location estimators Message-ID: <Bvyytu.KCq@mentor.cc.purdue.edu> Date: 11 Oct 92 18:17:53 GMT References: <BvGFAu.GEv@mentor.cc.purdue.edu> <1992Oct2.084819.11178@waikato.ac.nz> <1992Oct11.121834.11351@waikato.ac.nz> Sender: news@mentor.cc.purdue.edu (USENET News) Organization: Purdue University Statistics Department Lines: 114 In article <1992Oct11.121834.11351@waikato.ac.nz> maj@waikato.ac.nz writes: >In article <1992Oct2.084819.11178@waikato.ac.nz>, maj@waikato.ac.nz writes: >> In article <BvGFAu.GEv@mentor.cc.purdue.edu>, hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >> [stuff and quotes deleted] >>> Robustness cannot be defined in a formal manner to be a precise concept. >>> The definition I like is >>> The robustness of a procedure is the extent to which its >>> properties do not depend on those assumptions which one >>> does not wish to make. >> This is a good definition of what one wants, but I dispute that >> robustness cannot be given a formal definition. I'll try: >> A statistical functional is robust iff it is continuous. >> Continuous in what topology? you may ask. *That* I wont answer, >> that's were the vagueness of the concept comes in. >An email correspondence with Peter Hamer leads me to think that I was perhaps a >little on the terse side in my earlier posting, he writes: > > Take the distance between two cdfs F and G to > > be given by, say, the sup norm. > > It is easy to see that within any epsilon of > > F there are distributions with arbitrarily > > different mean. > I don't know how you intended this to be interpreted. I seems > very much like saying that applied statistics is impossible; > as no real data is *known* to come from a precisely specified > distribution. > All applied statistical procedures must be applicable to distributions > `close to' the one nominally assumed, and you seem to be saying that > this is impossible. >[My reply may be of interest to other readers of this group.] >Rather than try to persuade you about the correctness of my >remarks I have looked up a few references. Have a look at >Staudte & Sheather 'Robust Estimation and Testing' Wiley 1990 >Section 3.2.4 pp65-67 >Hampel, F.R. 'A General Qualitative definition of Robustness' >Ann Math Stat v42, 1887-1896, (1971) >[esp. Theorem 1, p1891] >Huber, P.J. 'Robust Statistical Procedures'(1977) #27 in >CBMS-NSF series. >[esp. Chapter 2 and first part of Ch 3. I prefer this to >the more elaborate treatment in his 1981 book.] >Huber and Hampel work in the full generality with statistics >understood as sequences of functionals. However the main >points are unchanged and more easily understood by considering >only functionals. >Example: the divisor n-1 standard deviation can be represented > as a sequence of functionals, one for each sample > size. > The divisor n standard deviation can be represented > as a single functional with no need to involve > sample size. >The discontinuity of the sample mean as a function from a >space of cdfs to the reals poses no real problem to >*applied* statistics because in practice we do not really use >the mean by itself: we actually employ data inspection >followed by transformations and/or outlier deletion. In reality >the robust perspective is more of a threat to traditional >*mathematical* statistics with its highly developed theory >about the behaviour of relatively simple statistics at >precisely specified models, something that does not really >model modern applied statistical practice in the least. This attempt to avoid the problem does not succeed. Any crass attempt to delete outliers messes up things but good; in most situations, I am much more inclined to go along with the long tails than with the total damage to the analysis which these produce, especially if there are explanatory variables. As for transformations, unless one is looking only at the distribution of a single random variable, they should NEVER be used, as they are almost certain to destroy the model. The observation that the model is not exactly correct is quite appropriate, but while it is possible to show that robustness in the sense I have given, which is a slight extension of the original definition of Box, is quite possible, robustness in the Huber-Hampel sense is so rare for reasonable models as to be a fatuity. The laws of large numbers are useful robustness theorems. The Central Limit Theorem is a robustness theorem. For regression, the Gauss-Markov Theorem is a robustness theorem of the important type of practical situations not covered by the use of continuity. Any time that it is shown that normality is not needed, but that a few moments suffice, one has a robustness result which is not covered by the Huber-Hempel definition. The well-known asymptotic properties of maximum likelihood and Bayes estimates are among this class. The mean may be the parameter of interest; there is no Huber-Hampel robust estimator of it. BTW, most simulations assume symmetry. This is a far stronger assumption than merely having a few moments. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@pop.stat.purdue.edu (Internet, bitnet) {purdue,pur-ee}!pop.stat!hrubin(UUCP)