NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / bit / listserv / sasl / 3497 < prev next >

Wrap

Text File | 1992-07-30 | 3.0 KB | 78 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!europa.asd.contel.com!paladin.american.edu!auvm!USCMVSA.BITNET!RROBERT Message-ID: <SAS-L%92073016291221@UGA.CC.UGA.EDU> Newsgroups: bit.listserv.sas-l Date: Thu, 30 Jul 1992 13:28:00 PDT Reply-To: Bob Roberts <RROBERT@USCMVSA.BITNET> Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET> From: Bob Roberts <RROBERT@USCMVSA.BITNET> Subject: more on nesting data steps Lines: 66 I recently wrote to ask if it was possible to nest data steps in SAS. Thanks to those of you who responded. The consensus is that it is not possible. Maybe someone can help me out with the original problem that led me to write in the first place. The problem stems from having to examine all possible pairs of observations in a data set in order to calculate several non-parametric statistics that are not handled by statistical software packages. A colleague (Michael Stallings) and I are attempting to write a SAS program that will allow us to convert an N x P matrix (N = subjects; P = variables) into a matrix of size (N(N - 1) X 2P). The larger matrix contains all possible pairs of records (each record consists of a set of observations (P) for a single person). We have written a program that will convert the smaller matrix to the larger one. The problem, of course, is that when N gets large, the program requires significant computational resources (e.g., if N = 1000 and P = 20, the larger matrix is of the order 999,000 by 40). It's a bit clunky now, but the code is: data first; set zero(keep=id var1 var2 var3) end=eof; nobs = _N_; array eins(50) v1-v50; * N = 50; einds(nobs) = id; if end then output; data second; set first(keep=v1-v50 nobs); array zwei(50) v1-v50; do i = 1 to nobs; do j = 1 to nobs; id1 = zwei(i); id2 = zwei(j); output; end; end; proc sort; by id1; data third; merge first(rename=(id=id1)) second (in=intwo) ; by id1; if intwo; data fourth; set third; proc sort; by id2; data fifth; merge first (rename=(id=id2 var1=avar1 var2=avar2 var3=avar3)) fourth (in=infour) by id2; if infour; if id1=id2 then delete; Because we routinely deal with data sets with large numbers of cases we are interested in writing a more effecient program. It turns out that we can achieve the final calculations of interest to us by pooling results obtained within each of the N matrices that consist of all possible pairings of subject i with all other subjects j. Our strategy was to loop out of the current data step after we had constructed all pairings for each subject i and (1) perform the merges required to retrieve the original data for each subject, (2) perform the calculations required, (3) save the results to a cummulative file, (4) delete the matrix for subject i, and (5) return to the next subject in the original data step. Since looping in and out of data steps is not possible, we would be interested to hear if anyone else has ideas about how to solve this problem.