NetNews Usenet Archive 1992 #23

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #23 / NN_1992_23.iso / spool / sci / math / stat / 2140 < prev next >

Wrap

Internet Message Format | 1992-10-15 | 6.0 KB

Path: sparky!uunet!comp.vuw.ac.nz!waikato.ac.nz!maj From: maj@waikato.ac.nz Newsgroups: sci.math.stat Subject: Robustness Message-ID: <1992Oct16.084117.11433@waikato.ac.nz> Date: 16 Oct 92 08:41:17 +1300 Organization: University of Waikato, Hamilton, New Zealand Lines: 127 I wrote : ~~~~~~~~~ Have a look at Staudte & Sheather 'Robust Estimation and Testing' Wiley 1990 Section 3.2.4 pp65-67 Hampel, F.R. 'A General Qualitative definition of Robustness' Ann Math Stat v42, 1887-1896, (1971) [esp. Theorem 1, p1891] Huber, P.J. 'Robust Statistical Procedures'(1977) #27 in CBMS-NSF series. [esp. Chapter 2 and first part of Ch 3. I prefer this to the more elaborate treatment in his 1981 book.] [stuff deleted] The discontinuity of the sample mean as a function from a space of cdfs to the reals poses no real problem to *applied* statistics because in practice we do not really use the mean by itself: we actually employ data inspection followed by transformations and/or outlier deletion. In reality the robust perspective is more of a threat to traditional *mathematical* statistics with its highly developed theory about the behaviour of relatively simple statistics at precisely specified models, something that does not really model modern applied statistical practice in the least. Herman Rubin responded [and my riposte is interpolated]: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This attempt to avoid the problem does not succeed. Any crass attempt to delete outliers messes up things but good; in most situations, I am much more inclined to go along with the long tails than with the total damage to the analysis which these produce, especially if there are explanatory variables. I am not quite sure what is meant by 'going along with' long tails. Naturally outlier rejection methodology which potentially draws on information about the problem other than that in the sample will be difficult to model asymptotically, but it does not lose legitimacy because of this. Robust statistics which downweight outliers _can_ be studied asymptotically, and although rarely used in practice can be thought of as a formalization of existing good applied statistical practice based on examination of outlying and influential points. As for transformations, unless one is looking only at the distribution of a single random variable, they should NEVER be used, as they are almost certain to destroy the model. So they destroy the model! What do I care? What did the model ever do for me? Seriously the model is only a construct used to help us understand the data. If we were ever fortunate enough to have sample size tending to infinity and disk quota doing the same we would inevitably observe fine structure that would make us want to refine and elaborate the model. It is not the _model_ that is sacrosanct, it is the _data_. By the way, I seem to remember that Box is not averse to the odd transformation. The observation that the model is not exactly correct is quite appropriate, but while it is possible to show that robustness in the sense I have given, which is a slight extension of the original definition of Box, is quite possible, robustness in the Huber-Hampel sense is so rare for reasonable models as to be a fatuity. I could counter that the use of nonrobust statistics like the mean and standard deviation without the removal of "obvious errors" [not quoting h.r.] is so rare as to be a fatuity. The laws of large numbers are useful robustness theorems. The Central Limit Theorem is a robustness theorem. For regression, the Gauss-Markov Theorem is a robustness theorem of the important type of practical situations not covered by the use of continuity. Sure they are robustness theorems. The only problem with them lies in the strength of their hypotheses, not with the strength of their conclusions. Any time that it is shown that normality is not needed, but that a few moments suffice, one has a robustness result which is not covered by the Huber-Hampel definition. I am not sure that this kind of situation is not covered by the H-H definition. Why not define a metric by $$d_k(F,G)=\mathop{\sup }\limits_{\nu \in \Re } \int_{-\infty }^\infty {|x-\nu |^kd|F-G|}$$ [OK, OK, OK, the guts of it is the integral of abs(x - something)**k w.r.t abs(F-G).] It seems to me that continuity in this sort of metric for k = 2, 3, or 4, say, expresses the kind of robustness that Dr Rubin is referring to. The well-known asymptotic properties of maximum likelihood and Bayes estimates are among this class. The mean may be the parameter of interest; there is no Huber-Hampel robust estimator of it. BTW, most simulations assume symmetry. This is a far stronger assumption than merely having a few moments. Symmetry is only a convenience so that comparative simulations can be seen to be comparing like with like. Every functional statistic is Fisher-consistent for its value at the population distribution. Usually it is only a matter of convenience what statistic one adopts. For example the sign test of the hypothesis that the median of the differences is zero is often used as a nonparametric substitute for the paired t test. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@pop.stat.purdue.edu (Internet, bitnet) {purdue,pur-ee}!pop.stat!hrubin(UUCP) -- Murray A. Jorgensen [ maj@waikato.ac.nz ] University of Waikato Department of Mathematics and Statistics Hamilton, New Zealand __________________________________________________________________ 'Tis the song of the Jubjub! the proof is complete, if only I've stated it thrice.'