NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / bit / listserv / sasl / 5745 < prev next >

Wrap

Text File | 1993-01-25 | 4.3 KB | 124 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!gatech!udel!darwin.sura.net!paladin.american.edu!auvm!UNC.BITNET!UNCSM1 Return-Path: <@OHSTVMA.ACS.OHIO-STATE.EDU:SAS-L@VTVM2.BITNET> Message-ID: <SAS-L%93012509284375@VTVM2.CC.VT.EDU> Newsgroups: bit.listserv.sas-l Date: Mon, 25 Jan 1993 09:29:00 EST Reply-To: Sally Muller <UNCSM1@UNC.BITNET> Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET> From: Sally Muller <UNCSM1@UNC.BITNET> Subject: re: Saving time with indices.... Lines: 111 ---------------------------------------------------------------------- SUMMARY: Word from high regarding Simon Dunkley's posting on "saving time with indices." E-ADDR: uncsm1@unc.bitnet NAME: Sally Muller PH/ADDR: 919-962-6501 OIT CB#3455, UNC, Chapel Hill NC 27514 ---------------------------------------------------------------------- As you all know SAS Institute has a strict policy about its employees replying to news groups. However, I occassionally get something to share with y'all..... ==========================begin SAS-L message=========================== > In article <199301221112.AA08459@lamb.sas.com>, > sdd@leicester.ac.uk (Simon Dunkley) writes: >> >> I have a large dataset (76,000 obs, 8 vars = 5.6Mb) that I >> needed to do a many to one linkage from a small dataset (900 >> obs, 2 vars) When I came across the 'create index' thing (new) >> in 'proc datasets' I thought that this might be a more >> economical way of doing the linkage rather than using 'proc >> sort' on both files. I was expecting that 'proc sort' and >> 'create index' would take about the same cpu time and the >> 'merge' would take slightly longer. The saving being that I >> do not need the disk space to hold the sorted verions of the >> datasets. The index (a character string) takes only 75K. >> >> Because the one dataset is tiny compared with the other I >> decided to see what the timing differences were whith the >> various combinations of sorting and indexing. >> >> order order merge >> Big time Tiny time time >> sort 12.6 sort 0.25 10.58 >> sort 12.6 index 0.57 11.89 >> index 22.6 sort 0.25 50.03 >> index 22.6 index 0.57 50.7 >> >> I really didn't expect the difference to be five fold on the >> merge, or three times overall. ] ----------------------------------------- > > The data step merge program is not listed here, but I suspect > that it looked something like this: > > data new; > merge large(in=large) > small(in=small); > by idvar; > if large and small; > > The developer creates an index on the "idvar" variable in the > dataset named "large". The expectation is that the SAS system > will use the index to quickly retrieve the subset of records in > "large" that have a matching value for "idvar" in small. > Unfortunately the index can only be used in this example to > surface the observations in "large" in ascending "idvar" order. > The index cannot be used to do the subsetting. > > If you converted this data step to an sql join then the indexes > would produce a performance gain. To rewrite the merge as an SQL > join: > > proc sql; > create table new as > select * from > small as i, large as j > where i.idvar = j.idvar; > > > This example was lifted from a paper by Beatrous in the SUGI 16 (ed. note ) > procedings pp 605-614. You may want to get a copy of this paper > as a guide to effective use of indexes. In the paper, the > following CPU results were reported: > > CPU CPU > Without With > Method Indexes Indexes > ===================================== > data step > merge 16 26 > > SQL Join 23 0.3 > > Adding indexes and converting the application from a data > step merge to an sql join took the cpu performance from 16 > seconds to .3 seconds. Not bad. ==========================end SAS-L message=========================== -- Stephen Beatrous "Can we buy a hooker?" "What?!" sassmb@unx.sas.com "A hooker, the thing on the back of a (919)677-8000 x7662 car that pulls a boat." A conversation between Lee(5) and his Dad(a lot older than 5)