home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!elroy.jpl.nasa.gov!usc!zaphod.mps.ohio-state.edu!darwin.sura.net!paladin.american.edu!auvm!UNC.BITNET!UPHILG
- Message-ID: <SAS-L%93010617090233@VTVM2.CC.VT.EDU>
- Newsgroups: bit.listserv.sas-l
- Date: Wed, 6 Jan 1993 17:08:00 EST
- Reply-To: "Philip Gallagher,(919)966-1065" <UPHILG@UNC.BITNET>
- Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET>
- From: "Philip Gallagher,(919)966-1065" <UPHILG@UNC.BITNET>
- Subject: Re: discard superfluous observations *HOW*?
- Comments: To: Patrick Haggard <ph@PHYSIOLOGY.OXFORD.AC.UK>
- Lines: 50
-
- Patrick Haggard wrote: c.uk
-
- > I have some data containing between n and m observations in each
- > of C conditions. I would like to have exactly n observations in
- > each condition, so that my design is balanced: I'm fairly happy that
- > discarding the excess observations shouldn't change things too much.
- > My question: can anyone suggest a way to discard the excess
- > observations in a SAS data step, and to number the remaining
- > observations 1...n for each condition.
-
- Ah! This is, in principle, an easy one. Since I will not
- have time to test the code I may wind up very embarassed, but
- here goes. (Even if I make some coding mistakes I am sure
- the general structure will work.)
-
- Assume data resides in a SAS dataset called WORK.OLD and
- that the condition-variable is named CONDIT. The following
- should work just fine:
-
- PROC SORT DATA=WORK.OLD OUT=WORK.SORDID;
- BY CONDIT;
- RUN;
-
- DATA BALANCED(LABEL='Dataset w/n observations/condition')
- EXCESS (LABEL='Dataset w/excess observations')
- ;
- SET WORK.SORDID;
- BY CONDIT; * <=== This creates the FIRST.CONDIT var;
- RETAIN NUMBER; * <=== Don't initialize this var automatically;
-
- * Initialize NUMBER each time a new condition is encountered;
- IF (FIRST.CONDIT EQ 1) THEN NUMBER = 0;
-
- NUMBER = NUMBER + 1;
- * Be sure to substitute the desired value for "n"
- in the following statement;
- IF (NUMBER LE n) THEN OUTPUT BALANCED;
- ELSE OUTPUT EXCESS;
- RUN;
-
- * Note that I don't just "discard" the unwanted observations.
- I recommend that one always take at least a cursory look
- at the discarded/omitted observations first. Every once
- in a while there is a hideous surprise concealed in what
- one thought could be discarded. Do a MEANS and a FREQ,
- at least.;
-
- Phil Gallagher
- UNC Biostatistics
- uphilg@unc
-