home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!gatech!udel!darwin.sura.net!paladin.american.edu!auvm!UNC.BITNET!UNCSM1
- Return-Path: <@OHSTVMA.ACS.OHIO-STATE.EDU:SAS-L@VTVM2.BITNET>
- Message-ID: <SAS-L%93012509284375@VTVM2.CC.VT.EDU>
- Newsgroups: bit.listserv.sas-l
- Date: Mon, 25 Jan 1993 09:29:00 EST
- Reply-To: Sally Muller <UNCSM1@UNC.BITNET>
- Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET>
- From: Sally Muller <UNCSM1@UNC.BITNET>
- Subject: re: Saving time with indices....
- Lines: 111
-
- ----------------------------------------------------------------------
- SUMMARY: Word from high regarding Simon Dunkley's posting on
- "saving time with indices."
- E-ADDR: uncsm1@unc.bitnet
- NAME: Sally Muller
- PH/ADDR: 919-962-6501 OIT CB#3455, UNC, Chapel Hill NC 27514
- ----------------------------------------------------------------------
- As you all know SAS Institute has a strict policy about its employees
- replying to news groups. However, I occassionally get something to
- share with y'all.....
-
- ==========================begin SAS-L message===========================
-
- > In article <199301221112.AA08459@lamb.sas.com>,
- > sdd@leicester.ac.uk (Simon Dunkley) writes:
- >>
- >> I have a large dataset (76,000 obs, 8 vars = 5.6Mb) that I
- >> needed to do a many to one linkage from a small dataset (900
- >> obs, 2 vars) When I came across the 'create index' thing (new)
- >> in 'proc datasets' I thought that this might be a more
- >> economical way of doing the linkage rather than using 'proc
- >> sort' on both files. I was expecting that 'proc sort' and
- >> 'create index' would take about the same cpu time and the
- >> 'merge' would take slightly longer. The saving being that I
- >> do not need the disk space to hold the sorted verions of the
- >> datasets. The index (a character string) takes only 75K.
- >>
- >> Because the one dataset is tiny compared with the other I
- >> decided to see what the timing differences were whith the
- >> various combinations of sorting and indexing.
- >>
- >> order order merge
- >> Big time Tiny time time
- >> sort 12.6 sort 0.25 10.58
- >> sort 12.6 index 0.57 11.89
- >> index 22.6 sort 0.25 50.03
- >> index 22.6 index 0.57 50.7
- >>
- >> I really didn't expect the difference to be five fold on the
- >> merge, or three times overall. ]
-
- -----------------------------------------
- >
- > The data step merge program is not listed here, but I suspect
- > that it looked something like this:
- >
- > data new;
- > merge large(in=large)
- > small(in=small);
- > by idvar;
- > if large and small;
- >
- > The developer creates an index on the "idvar" variable in the
- > dataset named "large". The expectation is that the SAS system
- > will use the index to quickly retrieve the subset of records in
- > "large" that have a matching value for "idvar" in small.
- > Unfortunately the index can only be used in this example to
- > surface the observations in "large" in ascending "idvar" order.
- > The index cannot be used to do the subsetting.
- >
- > If you converted this data step to an sql join then the indexes
- > would produce a performance gain. To rewrite the merge as an SQL
- > join:
- >
- > proc sql;
- > create table new as
- > select * from
- > small as i, large as j
- > where i.idvar = j.idvar;
- >
- >
- > This example was lifted from a paper by Beatrous in the SUGI 16
- (ed. note )
- > procedings pp 605-614. You may want to get a copy of this paper
- > as a guide to effective use of indexes. In the paper, the
- > following CPU results were reported:
- >
- > CPU CPU
- > Without With
- > Method Indexes Indexes
- > =====================================
- > data step
- > merge 16 26
- >
- > SQL Join 23 0.3
- >
- > Adding indexes and converting the application from a data
- > step merge to an sql join took the cpu performance from 16
- > seconds to .3 seconds. Not bad.
-
- ==========================end SAS-L message===========================
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- --
- Stephen Beatrous "Can we buy a hooker?" "What?!"
- sassmb@unx.sas.com "A hooker, the thing on the back of a
- (919)677-8000 x7662 car that pulls a boat."
- A conversation between Lee(5) and his
- Dad(a lot older than 5)
-