home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!europa.asd.contel.com!paladin.american.edu!auvm!USCMVSA.BITNET!RROBERT
- Message-ID: <SAS-L%92073016291221@UGA.CC.UGA.EDU>
- Newsgroups: bit.listserv.sas-l
- Date: Thu, 30 Jul 1992 13:28:00 PDT
- Reply-To: Bob Roberts <RROBERT@USCMVSA.BITNET>
- Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET>
- From: Bob Roberts <RROBERT@USCMVSA.BITNET>
- Subject: more on nesting data steps
- Lines: 66
-
- I recently wrote to ask if it was possible to nest data steps in
- SAS. Thanks to those of you who responded. The consensus is that
- it is not possible.
-
- Maybe someone can help me out with the original problem that led me
- to write in the first place. The problem stems from having to
- examine all possible pairs of observations in a data set
- in order to calculate several non-parametric statistics that are
- not handled by statistical software packages.
- A colleague (Michael Stallings) and I
- are attempting to write a SAS program that will allow us to convert
- an N x P matrix (N = subjects; P = variables) into a matrix of size
- (N(N - 1) X 2P). The larger matrix contains all possible pairs of
- records (each record consists of a set of observations (P) for a
- single person). We have written a program that will convert the
- smaller matrix to the larger one. The problem, of course, is that
- when N gets large, the program requires significant computational
- resources (e.g., if N = 1000 and P = 20, the larger matrix is of the
- order 999,000 by 40). It's a bit clunky now, but the code is:
-
- data first; set zero(keep=id var1 var2 var3) end=eof;
- nobs = _N_;
- array eins(50) v1-v50; * N = 50;
- einds(nobs) = id;
- if end then output;
-
- data second; set first(keep=v1-v50 nobs);
- array zwei(50) v1-v50;
- do i = 1 to nobs;
- do j = 1 to nobs;
- id1 = zwei(i);
- id2 = zwei(j);
- output;
- end;
- end;
- proc sort;
- by id1;
-
- data third; merge first(rename=(id=id1)) second (in=intwo) ; by id1;
- if intwo;
-
- data fourth; set third;
- proc sort;
- by id2;
-
- data fifth; merge first (rename=(id=id2 var1=avar1 var2=avar2
- var3=avar3))
- fourth (in=infour)
- by id2;
- if infour;
- if id1=id2 then delete;
-
-
- Because we routinely deal with data sets with large numbers of cases
- we are interested in writing a more effecient program.
- It turns out that we can achieve the final calculations of interest
- to us by pooling results obtained within each of the N matrices
- that consist of all possible pairings of subject i with all other
- subjects j. Our strategy was to loop out of the current data step
- after we had constructed all pairings for each subject i and (1)
- perform the merges required to retrieve the original data for each
- subject, (2) perform the calculations required, (3) save the results
- to a cummulative file, (4) delete the matrix for subject i, and (5)
- return to the next subject in the original data step. Since looping
- in and out of data steps is not possible, we would be interested to
- hear if anyone else has ideas about how to solve this problem.
-