NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / bit / listserv / statl / 1234 < prev next >

Wrap

Text File | 1992-07-31 | 5.0 KB | 94 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!paladin.american.edu!auvm!CORNELLC.BITNET!TCD Message-ID: <STAT-L%92073115220242@VM1.MCGILL.CA> Newsgroups: bit.listserv.stat-l Date: Fri, 31 Jul 1992 14:37:46 EDT Sender: "STATISTICAL CONSULTING" <STAT-L@MCGILL1.BITNET> From: Tim Dorcey <TCD@CORNELLC.BITNET> Subject: Interaction effects in regression Lines: 83 A few more comments to add to the discussion on whether "main effects" should be deleted when the interaction in a regression model is significant... I find it helpful to keep in mind that, fundamentally, regression models are about prediction, i.e., how much can I expect the response variable to change as changes are made in the predictors. Typically, a regression model is parameterized in such a way that the parameter estimates have an immediate connection to this question. For example, suppose we have the linear model: E[y|x,z] = a + b*x + c*z If I then ask, what would happen if I increased x by 1 unit while holding z constant, I can compute: E[y|x+1,z] = a + b*(x+1) + c*z and then subtract to get: E[y|x+1,z] - E[y|x,z] = b This is very nice because a change in x has the same effect regardless of the actual values of x & z. Thus, my question has a simple answer, and a test of the hypothesis that b=0 has a direct connection to the issue of prediction. But, now suppose I ask this same question when the model is: E[y|x,z] = a + b*x + c*z + d*x*z I compute: E[y|x+1,z] = a + b*(x+1) + c*z + d*(x+1)*z and then subtract to get: E[y|x+1,z] - E[y|x,z] = b + d*z Evidentally, the answer now depends upon the value of z. E.g., when z = -b/d, x has no effect on the response. Furthermore, a test of the hypothesis that b = 0, now has only narrow implications for prediction. In particular, if b = 0, it simply means that a change in x has no effect on y when z = 0. I can imagine situations where this conclusion might be interesting, but that would be the exception rather than the rule (clearly, it would have to be a situation where the origin of z was theoretically meaningful). So, to get back to the original question, if b is not significantly different from 0, should we force it to be zero (i.e., delete x from the model)? The logic in forcing a coefficient to be 0 (or any other fixed value) is that the coefficients of a model with fewer parameters can be estimated with greater precision. The main drawback is that if the true coefficient is different from the value that we fixed it at, then the other estimates will be biased. Furthermore, if the decision to omit a variable is based upon the same set of data that the reduced model is then fit to, none of the distributional results (e.g., t-tests) are valid. I.e., these tests are based on the assumption that we chose our predictors without looking at the data. It is interesting to consider the "test-and-refit" approach in the context of forcing parameters to be some other value than 0. Suppose we adopted the following strategy: 1) fit an initial model E[y|x,z] = a + b*x + c*z 2) test the hypothesis that: b = 2 3) if b is not significantly different than 2, force it to be 2 and refit the model to get better estimates of a and c. I suspect that many who are quite comfortable omitting non-significant variables from regression models would be skeptical of this approach, even though, in the context of linear regression theory, it is absolutely equivalent. So, what is it about 0 that is special and how does that relate to the original question about main effects and interactions? 1) Fixing a coefficient to 0 means that we don't even need to know the value of the corresponding variable, so we end up with a more parsimonious model. In the case, E[y|x,z] = a + b*x + c*z + d*x*z, however, regardless of whether we set b and/or c to 0, we still need to know the values of x and z. The resulting model is no more parsimonious. 2) On an a priori basis, it seems that "this variable has no effect" is a more plausible conclusion than "this variable has a regression coefficient of 2". As discussed previously, however, in the interaction model, "b = 0" is equivalent to "this variable has no effect when z=0". Unless there were prior reasons to expect that particular result, "b=0" remains on the same footing as "b=2". Therefore, my conclusion would be to leave the main effects in the model, except perhaps under the special circumstance where it was expected that "the effect of x when z=0" might be 0. Even then, I would personally keep the full model, because I don't think the increased precision of parameter estimates is worth the risk of introducing bias. The exercise above was only meant to show that even if you buy the general idea of omitting non-significant variables from regression models, the interaction model is different. Tim Dorcey BITNET: TCD@CORNELLC Statistical Software Consultant Internet: TCD@CORNELLC.CIT.CORNELL.EDU Cornell Information Technologies Phone: (607) 255-5715 Cornell University Ithaca, NY 14853