home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!uvaarpa!darwin.sura.net!paladin.american.edu!auvm!COMPUSERVE.COM!76350.1604
- Message-ID: <920903144834_76350.1604_EHJ71-1@CompuServe.COM>
- Newsgroups: bit.listserv.sas-l
- Date: Thu, 3 Sep 1992 10:48:34 EDT
- Reply-To: Andy Norton <76350.1604@COMPUSERVE.COM>
- Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET>
- From: Andy Norton <76350.1604@COMPUSERVE.COM>
- Subject: Re: Finding non-dups via SQL (multiple keys)
- Comments: To: SAS-L <SAS-L@AWIIMC12.IMC.UNIVIE.AC.AT>
- Lines: 83
-
- ----------------------------------------------------------------------
- CONTENT: Information
- SUMMARY: Jim Clark's response to Ray Pass is correct for single
- keys. Here are solutions for multiple keys.
- REL/PLTF: 6.07.01/CMS, 6.04/PC-DOS
- E-ADDR: 76350.1604@compuserve.com
- NAME: Andy Norton
- ADDRESS: Trilogy Consulting, 5228 Lovers Lane, Kalamazoo MI 49002
- PHONE: (616) 344-2191
- ----------------------------------------------------------------------
-
- Ray Pass had asked for a method to identify CASENO's in FILE1 that do
- not occur in FILE2.
-
- Jim Clark responded with
- > title 'Eval Courses NOT IN Class Courses';
- > select distinct e.ident, e.ins
- > from work.check e, f.course c
- > where e.ident not in
- > (select c.ident
- > from f.course c)
- > order by e.ident;
-
- The general approach is correct and sufficient for Ray's problem. I
- think you could use simply
- select CHECK.*
- from WORK.CHECK
- where CHECK.IDENT not in
- ( select IDENT
- from F.COURSE)
- order by IDENT;
- to get the same result. In other words, it is not necessary to join E
- with C in order to reference table C in a subquery.
-
- The main purpose of my posting, however, is to point out that this
- method falls apart when you need to match on two keys instead of one.
- Suppose the IDENT's are repeated within different colleges, and you
- need to do a combined analysis. Now the observations are identified
- with {COLLEGE, IDENT}. The IN operator only works for a single
- variable, so can't handle this situation.
-
- This is a widespread weakness in the design of SQL: programs that work
- with single columns often cannot be extended to handle multiple
- columns. If you write the program to handle multiple keys, it will be
- able to handle the single-key situation.
-
- For a general solution which works with multiple keys, use
- select *
- from WORK.CHECK
- where not exists
- ( select *
- from F.COURSE
- where COLLEGE = CHECK.COLLEGE and
- IDENT = CHECK.IDENT)
- order by COLLEGE, IDENT;
-
- This is fairly intelligible but can be inefficient with the current
- optimizer, because the subquery is independently evaluated for each set
- of {COLLEGE, IDENT} values. Defining indexes on COLLEGE and IDENT (in
- the COURSE file) could help.
-
- It may be possible for the optimizer to automatically convert
- subqueries to joins in the future.
-
- This alternative may be more efficient (depending on factors such as
- how many distinct {COLLEGE, IDENT} sets there are). I haven't tested
- enough to determine exactly what the trade-offs are:
-
- select CHECK.*
- from WORK.CHECK,
- ( select COLLEGE, IDENT
- from WORK.CHECK
- except
- select COLLEGE, IDENT
- from F.COURSE) as NONDUP
- where CHECK.COLLEGE = NONDUP.COLLEGE and
- CHECK.IDENT = NONDUP.IDENT
- order by COLLEGE, IDENT;
-
- Unlike the earlier subquery solution, this program processes all of the
- different {COLLEGE, IDENT} pairs together.
-
- Andy
-