home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!stanford.edu!bcm!convex!news.oc.com!eff!sol.ctr.columbia.edu!zaphod.mps.ohio-state.edu!darwin.sura.net!paladin.american.edu!auvm!BARUCH.BITNET!TEJERA
- Message-ID: <SAS-L%92110915363768@UGA.CC.UGA.EDU>
- Newsgroups: bit.listserv.sas-l
- Date: Mon, 9 Nov 1992 15:18:28 EST
- Reply-To: Philip Tejera <TEJERA@BARUCH.BITNET>
- Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET>
- From: Philip Tejera <TEJERA@BARUCH.BITNET>
- Subject: Re: Data Reshaping ?
- In-Reply-To: Message of Wed, 4 Nov 1992 16:56:49 EST from <NY922195@PACEVM>
- Lines: 82
-
- On Wed, 4 Nov 1992 16:56:49 EST Chris said:
- >Hi ! I've got a data reshaping problem, compounded by the fact some
- >of the data is stored in permanent SAS datasets and the rest is con-
- >tained in raw datasets. In both cases, the data looks like this:
- >
- > DEPNO DEPT YR V1 V2 V3 V4 V5
- > 121 MIS 80 10 11 12 13 14
- > 121 MIS 81 15 16 17 18 19
- > 121 MIS 82 20 21 22 23 24
- > 144 COMM 80 41 42 43 44 45
- > 144 COMM 81 46 47 48 49 50
- > 144 COMM 82 51 52 53 54 55
- >
- >The actual datasets have 15 records per DEPNO and 40 variables per
- >record. The data originally came from SQL tables (IBM's SQL/DS dbms)
- >before being dumped to flat files and then converted (in some cases)
- >to permanent SAS datasets.
- >
- >What I want is to fold the individual rows back into one record, with
- >the data values (V1 thru V5 in the above example) forming a 3 x 5 array
- >(15 x 40 for the real data) for each record, such as the following:
- >
- > 121 MIS 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
- > 144 COMM 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
- > ...
- >The datasets/files are fairly large: 10,000 to 25,000 records.
- >We are running SAS 6.07 under VM/ESA...
-
- Although I have seen the previously posted answer on using Proc Transpose
- to do this, I remain dissatisfied. SAS is a very powerful tool, but surely
- we would want a student to learn the old KISS rule (Keep It Simple ...).
- Also, with a "fairly large dataset", efficient techniques should be used.
-
- There seems to be a slight error in the question, namely the statement
- that the resulting one record is "3 x 5", when really it is "1 x 15"
- from 3 arrays each "1 x 5", but I think the meaning is clear.
-
- The solution for the raw datasets is to use the Input statement to read
- more than one record per observation. See SAS Language Reference, Version
- 6, First Edition, page 400, for instance, and the preceding material on
- line pointer controls, p. 398 ff. This is nothing fancy, most packages
- can do this easily, no doubt that is why the data is in this format.
-
- As to the part of the data stored in sas datasets, use of the SET
- statement after using the RENAME dataset option should do the trick.
- We don't know how these raw and SAS datasets are mixed, but the
- Input and Set statements should be able to handle most combinations,
- along with the Infile statement to point to the appropriate raw data
- file. Also, SAS provides many possibilities for embellishments, such
- as checking that there are no missing rows, etc.
-
- A simple example would be something like:
-
- Data from_raw;
- Infile ... ;
- Input deptno dept $ yr1 v1-v5 /
- deptno2 dept2 $ yr2 v6-v10 /
- deptno3 dept3 $ yr3 v11-v15 ;
-
- If the data is already in a SAS dataset, then code something like:
-
- Data from_sas;
- Set in1;
- Set in1 (rename=(deptno=deptno2,dept=dept2,yr=yr2,v1=v6,v2=v7,...));
- Set in1 (rename=(deptno=deptno3,dept=dept3,yr=yr3,v1=v11,v2=v12,...));
-
- In either example, one would probably also use the Drop statement to get
- rid of redundant information, i.e.,
-
- Drop deptno2, dept2, yr2, deptno3, dept3, yr3;
-
- Of course, a wise programmer would first test that the information was in
- fact redundant, but I leave that up to the reader, with the hint that
- the Lostcard statement might be worth looking up.
-
- No, I didn't test this code, since the intent is to show the direction
- for study, and some details are missing. The main point is that it is
- possible to do this reshaping with only one pass of the data.
-
- Other useful references would be the SAS Applications Guide, 1987,
- and SAS Language and Procedures, Usage, Version 6, First Edition,
- especially pp. 57 and following.
-