NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / bit / listserv / statl / 2304 < prev next >

Wrap

Text File | 1993-01-05 | 3.9 KB | 74 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!zaphod.mps.ohio-state.edu!malgudi.oar.net!news.ysu.edu!psuvm!auvm!UNC.BITNET!UPHILG Message-ID: <STAT-L%93010516070081@VM1.MCGILL.CA> Newsgroups: bit.listserv.stat-l Date: Tue, 5 Jan 1993 13:38:00 EST Sender: STATISTICAL CONSULTING <STAT-L@MCGILL1.BITNET> From: "Philip Gallagher,(919)966-1065" <UPHILG@UNC.BITNET> Subject: Spreadsheet Software for statistical computing Lines: 63 Dr. Arday wrote that he doesn't believe that the formulae he uses from Fleiss and from Kleinbaum, et al., are "wrong", and that the professionals who warn against using naive implementations of the texbook formulae (in spreadsheet packages) need to substantiate their claims. I think Dr. Arday didn't get the right intelligence (in the military sense) from the information in Phil Miller's note. Perhaps it is clearer to say that ... the textbook formulae are right, but they give wrong answers in many implementations, especially implementations on digital computers. ... Even in hand calculations where one has (in principle!) complete control over the number of decimal places, naive implementation of the textbook formulae can lead to wrong answers in many situations; one (of which I am sure you are already familiar) is when some of the variables range, say, from 1 to 10, and others range from 1,000,000,000,000 to 1,000,000,000,010. Differences and proportions (which look great in the closed form formula) can cause instant "... can't tell the difference between zero and a very small number ..." problems. In first semester courses one is taught to "center and scale" such variables into common ranges before hauling out the cookbook formula. I am not sufficiently versed in the details of statistical computing to spell out in detail the horrendously more complex problems one encounters in many of the more sophisticated matrix manipulations (although I have seen Ron Helms write many of them out on the blackboard), but I am sure that someone on the list can point to one of the better texts on statistical computing. Just the choice of WHICH matrix decomposition routine to use for a particular task often requires a high-powered consult. (I perceive this as a problem quite different from the "approximate" formulae Dr. Arday wrote of.) I close with an inadequate reference to work done by two (or more, perhaps) folks in the Washington, D.C., area - one at WESTAT and his colleague at ?Census?Labor Statistics? in the early 80s. For at least two years in a row (I know, because I attended their practice presentations at the Washington Statistical Society) they evaluated something like 20-30 "stat packages" for presentation at the annual Joint Meetings of the ASA, etc. For some of the ill-conditioned data (see the Longley data in the SAS sample library, for example) they found that half the packages couldn't achieve even two digits of precision. Which didn't stop the packages from reporting many more, as if they were correct. One of the packages couldn't even get the first digit right. The lesson here is that, unless one is using carefully chosen textbook datasets with low collinearity and "nice" values, the implementations of the textbook formulae are not immaterial. If an investigator wishes to avoid the work of learning about the complexities of statistical computing, then perhaps he or she could consider listening to those who have done their years of homework. Hardly anyone can be an expert in everything, after all. I suppose this sounds like a flame, although it isn't intended that way; I write only because I feel it would be unprofessional to permit misconceptions to go without rebuttal. Phil Gallagher uphilg@unc