NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / sci / math / numanal / 3615 < prev next >

Wrap

Internet Message Format | 1992-12-16 | 2.8 KB

Xref: sparky sci.math.num-analysis:3615 sci.math.stat:2625 sci.math.symbolic:3301 Newsgroups: sci.math.num-analysis,sci.math.stat,sci.math.symbolic Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!spool.mu.edu!umn.edu!lynx!nmsu.edu!dante!dclason From: dclason@dante.nmsu.edu (Dennis Clason) Subject: Re: mathematica stepwise regression package Message-ID: <1992Dec16.213059.28461@nmsu.edu> Keywords: mathematica stepwise regression Sender: usenet@nmsu.edu Organization: New Mexico State University, Las Cruces, NM References: <1446@ares.edsr.eds.com> <mcclella.724399758@yertle.Colorado.EDU> <1992Dec15.083258.1474@lth.se> Date: Wed, 16 Dec 1992 21:30:59 GMT Lines: 56 In article <1992Dec15.083258.1474@lth.se> andersh@maths.lth.se (Anders Holtsberg) writes: > >gary mcclelland mcclella@yertle.colorado.edu: >>Stepwise regression is a tool of the devil. Anyone smart enough >>to be using Mathematica ought to be able to decide which >>questions he or she wants to ask of the data rather than letting >>a demonstrably suboptimal stepwise algorithm decide which >>questions to ask. One day there may be good AI programs for >>doing statistical analysis, but stepwise regression won't be >>among them. > >You mean what is bad? Let's say we want to make predictions. >Is the idea of using a subset of the predictors that is bad or >do you mean the stepwise way to pick them? If the latter: do >you know any better way (except trying all combinations)? > Efroymson's stepwise algorithms are bad for any of a number of reasons. First, the statistic labelled "F" is generally a pseudo-F statistic, whose value CANNOT be reasonably compared to the central F distribution. Second, the method tends to converge to wrong models, and often the model selected omits important regressors. In fact, the algorithms may never even consider the appropriate model ( whatever THAT is.) There are better subset regression algorithms available. The most efficient known algorithm is Furnival and Wilson's Leaps and Bounds, implemented in BMDP 9R and in the new SAS PROC REG (v. 6.07). Ron Hocking and Lynn LaMotte did a lot of work in this area during the 70s. Hocking's review papers in Biometrics (around '80 or so) and Tech (around '85?) are the best starting place for self study in this area. As a statistical consultant I dread it when a collaborator comes into my office and says "I've been running PROC STEPWISE and I've got this nice regression model. . ." The only reason not to run all possible subsets is that it takes too much resources. Smart algorithms cut way down on the resources needed to get the effect of all possible subsets. You ought to use them. Dennis --- Dennis L. Clason dclason@nmsu.edu Dept of Experimental Statistics ESTATX08@NMSUVM1.NMSU.EDU New Mexico State University Las Cruces, NM 88001