NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / bit / listserv / sasl / 5544 < prev next >

Wrap

Text File | 1993-01-06 | 2.1 KB | 53 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!wupost!darwin.sura.net!paladin.american.edu!auvm!COMPUSERVE.COM!71020.1025 Message-ID: <930107031142_71020.1025_EHC114-1@CompuServe.COM> Newsgroups: bit.listserv.sas-l Date: Wed, 6 Jan 1993 22:11:42 EST Reply-To: William Kahn <71020.1025@COMPUSERVE.COM> Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET> From: William Kahn <71020.1025@COMPUSERVE.COM> Subject: t Comments: To: sas-l@ohstvma.bitnet Lines: 40 Patrick Haggard wrote > I have some data containing between n and m observations in each > of C conditions. I would like to have exactly n observations in > each condition Which received (as of my last scan) two similar responses-each keeping the first n observations of the up to m (m>=n) in each group. May I suggest that an explicitly _random_ subset of each group be selected rather than the first? Even if there is no known order to the data often there is a non-random (though not known) order. data t; set old; x=ranuni(8911002); proc sort; by group x; *note explicit scrambling within group; data new; set t; by group; if first.group then count=0; count+1; *using implicit retain implied by this syntax; tag=(count>n) *keep all observations in same dataset; proc glm; class group; where tag=0; model dv=group; *use where; BUT--a statistics question arises. When is it better to throw out data in order to attain balance than analyze the unbalanced design? Granted, the estimates you get which assume balance are no longer min variance unbaised, but don't they always have smaller mean square error than throwing out data? If you have a procedure which requires balance (proc anova) won't you get smaller mse estimates by averaging your m points down to n points (say average m-n pairs) and ignoring the averaging in the analysis than by throwing out m-n data points? Data is so precious--seems a crying waste to throw it out just because some mathematicians/programmers don't give us an optimal analysis algorithm. Bill Kahn <71020.1025@compuserve.com> W. L. Gore and Associates Distribution: >INTERNET:sas-l@ohstvma.bitnet