NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / bit / listserv / statl / 2002 < prev next >

Wrap

Text File | 1992-11-17 | 4.4 KB | 75 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!stanford.edu!bcm!convex!darwin.sura.net!spool.mu.edu!yale.edu!news.yale.edu!mars.caps.maine.edu!maine.maine.edu!cunyvm!psuvm!auvm!UNC.BITNET!UPHILG Message-ID: <STAT-L%92111711305709@VM1.MCGILL.CA> Newsgroups: bit.listserv.stat-l Date: Tue, 17 Nov 1992 09:10:00 EST Sender: "STATISTICAL CONSULTING" <STAT-L@MCGILL1.BITNET> From: "Philip Gallagher,(919)966-1065" <UPHILG@UNC.BITNET> Subject: Do assumptions violations mean anything? Lines: 64 The recent discussion of assumption-violations has reinforced my long standing (not very popular) observation that statisticians and others attempting to practice statistics often get so comfortable with our (valid) jargon and (again valid) knee-jerk practices that we gloss over, perhaps even in our own minds, any real meaning that our practices may be accessing. I illustrate with a one-way ANOVA. So we do some kind of test for homo- scedasticity; it fails. What does that mean? Not "The data violate the assumptions", not really. What it means is that we have one or more cells in which the distribution of the data differs from the distribution of the data in other cells. If that were true, what real meaning would an analysis that certifies the cell means are not equal have? Suppose the distribution in one cell were highly skewed to the right and another cell to the left, but with equal means and equal variances? It would almost certainly be a very unusual set of scientific data indeed where the failure of an ANOVA to detect differences in means would be the result the scientist wanted to have called to his attention. It strikes me that the testing of assumptions is most easily taught by showing the students that violation of the assumptions means that the distributions differ, but not necessarily in the way that the specific procedure (say, ANOVA) is directed at (equality of means). Once the student sees clearly the underlying phenomon that leads to an assumption violation it becomes very hard to prevent the student from looking for those violations (and in a very perceptive way, too). I have had amazingly good experiences in the last few years by encouraging students to "look for systematic characteristics in those persons for whom the model does not predict well" rather than "examine the residuals". My first real success along this line came after having begged, cajoled, and demanded that a osteoporosis student examine the residuals in the model for three months and gotten nowhere; when I said "Well, forget about looking at the residuals, just figure out which groups the model doesn't fit well" I got the best part of the answer in (gasp!) two hours. Two months later over a celebratory drink I flabbergasted the student by explaining that she had actually done an analysis of the residuals. (There is success in this world sometimes - this person is now faculty at another school, and last month one of her students was complaining to me about being forced to examine residuals! Hallelujah!) The gist of this not extensively edited comment is that we often become so entranced by the mathematical aspects of what we are doing that we fail to remember that the mathematics (at least for statistical analyses) is usually a reflection of some aspect of the data that one need not be a statistician to understand. I conclude with my favorite way of examing the similarity of two distributions, attributable to Dana Quade. One plots the empirical distribution of distn1 on the Y-axis against the empirical distribu- tion of distn2 on the X-axis. (When not looking for differences in location I center both distributions at zero first). If the distributions are similar, the result will be a straight line (at 45 degrees if you scale the axes cleverly). Large differences in dispersion result in S-shaped pictures; the graph is very informative, both to the statistician and to the scientist. Sometimes this picture makes the differences in the distributions so clear that everyone gladly abandons the original intention of testing means. Thank goodness. Phil Gallagher uphilg@unc