home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky sci.math.num-analysis:3615 sci.math.stat:2625 sci.math.symbolic:3301
- Newsgroups: sci.math.num-analysis,sci.math.stat,sci.math.symbolic
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!spool.mu.edu!umn.edu!lynx!nmsu.edu!dante!dclason
- From: dclason@dante.nmsu.edu (Dennis Clason)
- Subject: Re: mathematica stepwise regression package
- Message-ID: <1992Dec16.213059.28461@nmsu.edu>
- Keywords: mathematica stepwise regression
- Sender: usenet@nmsu.edu
- Organization: New Mexico State University, Las Cruces, NM
- References: <1446@ares.edsr.eds.com> <mcclella.724399758@yertle.Colorado.EDU> <1992Dec15.083258.1474@lth.se>
- Date: Wed, 16 Dec 1992 21:30:59 GMT
- Lines: 56
-
- In article <1992Dec15.083258.1474@lth.se> andersh@maths.lth.se (Anders Holtsberg) writes:
- >
- >gary mcclelland mcclella@yertle.colorado.edu:
- >>Stepwise regression is a tool of the devil. Anyone smart enough
- >>to be using Mathematica ought to be able to decide which
- >>questions he or she wants to ask of the data rather than letting
- >>a demonstrably suboptimal stepwise algorithm decide which
- >>questions to ask. One day there may be good AI programs for
- >>doing statistical analysis, but stepwise regression won't be
- >>among them.
- >
- >You mean what is bad? Let's say we want to make predictions.
- >Is the idea of using a subset of the predictors that is bad or
- >do you mean the stepwise way to pick them? If the latter: do
- >you know any better way (except trying all combinations)?
- >
-
- Efroymson's stepwise algorithms are bad for any of
- a number of reasons. First, the statistic labelled
- "F" is generally a pseudo-F statistic, whose value
- CANNOT be reasonably compared to the central F
- distribution. Second, the method tends to converge
- to wrong models, and often the model selected
- omits important regressors. In fact, the algorithms
- may never even consider the appropriate model (
- whatever THAT is.)
-
- There are better subset regression algorithms
- available. The most efficient known algorithm
- is Furnival and Wilson's Leaps and Bounds, implemented
- in BMDP 9R and in the new SAS PROC REG (v. 6.07).
- Ron Hocking and Lynn LaMotte did a lot of work in
- this area during the 70s. Hocking's review papers
- in Biometrics (around '80 or so) and Tech (around '85?)
- are the best starting place for self study in this
- area.
-
- As a statistical consultant I dread it when a
- collaborator comes into my office and says "I've
- been running PROC STEPWISE and I've got this
- nice regression model. . ." The only reason
- not to run all possible subsets is that it
- takes too much resources. Smart algorithms
- cut way down on the resources needed to get
- the effect of all possible subsets. You ought
- to use them.
-
- Dennis
-
- ---
- Dennis L. Clason dclason@nmsu.edu
- Dept of Experimental Statistics ESTATX08@NMSUVM1.NMSU.EDU
- New Mexico State University
- Las Cruces, NM 88001
-
-
-