home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!comp.vuw.ac.nz!waikato.ac.nz!maj
- From: maj@waikato.ac.nz
- Newsgroups: sci.math.stat
- Subject: Robustness
- Message-ID: <1992Oct16.084117.11433@waikato.ac.nz>
- Date: 16 Oct 92 08:41:17 +1300
- Organization: University of Waikato, Hamilton, New Zealand
- Lines: 127
-
- I wrote :
- ~~~~~~~~~
- Have a look at
-
- Staudte & Sheather 'Robust Estimation and Testing' Wiley 1990
- Section 3.2.4 pp65-67
-
- Hampel, F.R. 'A General Qualitative definition of Robustness'
- Ann Math Stat v42, 1887-1896, (1971)
- [esp. Theorem 1, p1891]
-
- Huber, P.J. 'Robust Statistical Procedures'(1977) #27 in
- CBMS-NSF series.
- [esp. Chapter 2 and first part of Ch 3. I prefer this to
- the more elaborate treatment in his 1981 book.]
-
- [stuff deleted]
-
- The discontinuity of the sample mean as a function from a
- space of cdfs to the reals poses no real problem to
- *applied* statistics because in practice we do not really use
- the mean by itself: we actually employ data inspection
- followed by transformations and/or outlier deletion. In reality
- the robust perspective is more of a threat to traditional
- *mathematical* statistics with its highly developed theory
- about the behaviour of relatively simple statistics at
- precisely specified models, something that does not really
- model modern applied statistical practice in the least.
-
- Herman Rubin responded [and my riposte is interpolated]:
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- This attempt to avoid the problem does not succeed. Any crass attempt
- to delete outliers messes up things but good; in most situations, I am
- much more inclined to go along with the long tails than with the total
- damage to the analysis which these produce, especially if there are
- explanatory variables.
-
- I am not quite sure what is meant by 'going along with' long
- tails. Naturally outlier rejection methodology which potentially
- draws on information about the problem other than that in the
- sample will be difficult to model asymptotically, but it does
- not lose legitimacy because of this. Robust statistics which
- downweight outliers _can_ be studied asymptotically, and
- although rarely used in practice can be thought of as a
- formalization of existing good applied statistical practice
- based on examination of outlying and influential points.
-
- As for transformations, unless one is looking
- only at the distribution of a single random variable, they should NEVER
- be used, as they are almost certain to destroy the model.
-
- So they destroy the model! What do I care? What did the
- model ever do for me? Seriously the model is only a construct
- used to help us understand the data. If we were ever fortunate
- enough to have sample size tending to infinity and disk quota
- doing the same we would inevitably observe fine structure that
- would make us want to refine and elaborate the model. It is
- not the _model_ that is sacrosanct, it is the _data_.
-
- By the way, I seem to remember that Box is not averse to the
- odd transformation.
-
- The observation
- that the model is not exactly correct is quite appropriate, but while
- it is possible to show that robustness in the sense I have given, which
- is a slight extension of the original definition of Box, is quite possible,
- robustness in the Huber-Hampel sense is so rare for reasonable models as
- to be a fatuity.
-
- I could counter that the use of nonrobust statistics like
- the mean and standard deviation without the removal of
- "obvious errors" [not quoting h.r.] is so rare as to be
- a fatuity.
-
- The laws of large numbers are useful robustness theorems. The Central
- Limit Theorem is a robustness theorem. For regression, the Gauss-Markov
- Theorem is a robustness theorem of the important type of practical
- situations not covered by the use of continuity.
-
- Sure they are robustness theorems. The only problem with
- them lies in the strength of their hypotheses, not with
- the strength of their conclusions.
-
- Any time that it
- is shown that normality is not needed, but that a few moments suffice,
- one has a robustness result which is not covered by the Huber-Hampel
- definition.
-
- I am not sure that this kind of situation is not covered
- by the H-H definition. Why not define a metric by
-
- $$d_k(F,G)=\mathop{\sup }\limits_{\nu \in \Re }
- \int_{-\infty }^\infty {|x-\nu |^kd|F-G|}$$
- [OK, OK, OK, the guts of it is the integral of
- abs(x - something)**k w.r.t abs(F-G).]
-
- It seems to me that continuity in this sort of metric for
- k = 2, 3, or 4, say, expresses the kind of robustness that
- Dr Rubin is referring to.
-
- The well-known asymptotic properties of maximum likelihood and Bayes
- estimates are among this class. The mean may be the parameter of interest;
- there is no Huber-Hampel robust estimator of it.
-
- BTW, most simulations assume symmetry. This is a far stronger assumption
- than merely having a few moments.
-
- Symmetry is only a convenience so that comparative simulations
- can be seen to be comparing like with like. Every functional
- statistic is Fisher-consistent for its value at the
- population distribution. Usually it is only a matter of
- convenience what statistic one adopts. For example the
- sign test of the hypothesis that the median of the differences
- is zero is often used as a nonparametric substitute for the
- paired t test.
- --
- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette
- IN47907-1399
- Phone: (317)494-6054
- hrubin@pop.stat.purdue.edu (Internet, bitnet)
- {purdue,pur-ee}!pop.stat!hrubin(UUCP)
- --
- Murray A. Jorgensen [ maj@waikato.ac.nz ] University of Waikato
- Department of Mathematics and Statistics Hamilton, New Zealand
- __________________________________________________________________
- 'Tis the song of the Jubjub! the proof is complete,
- if only I've stated it thrice.'
-