NetNews Usenet Archive 1992 #19

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #19 / NN_1992_19.iso / spool / bit / listserv / sasl / 4013 < prev next >

Wrap

Text File | 1992-09-03 | 3.6 KB | 96 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!uvaarpa!darwin.sura.net!paladin.american.edu!auvm!COMPUSERVE.COM!76350.1604 Message-ID: <920903144834_76350.1604_EHJ71-1@CompuServe.COM> Newsgroups: bit.listserv.sas-l Date: Thu, 3 Sep 1992 10:48:34 EDT Reply-To: Andy Norton <76350.1604@COMPUSERVE.COM> Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET> From: Andy Norton <76350.1604@COMPUSERVE.COM> Subject: Re: Finding non-dups via SQL (multiple keys) Comments: To: SAS-L <SAS-L@AWIIMC12.IMC.UNIVIE.AC.AT> Lines: 83 ---------------------------------------------------------------------- CONTENT: Information SUMMARY: Jim Clark's response to Ray Pass is correct for single keys. Here are solutions for multiple keys. REL/PLTF: 6.07.01/CMS, 6.04/PC-DOS E-ADDR: 76350.1604@compuserve.com NAME: Andy Norton ADDRESS: Trilogy Consulting, 5228 Lovers Lane, Kalamazoo MI 49002 PHONE: (616) 344-2191 ---------------------------------------------------------------------- Ray Pass had asked for a method to identify CASENO's in FILE1 that do not occur in FILE2. Jim Clark responded with > title 'Eval Courses NOT IN Class Courses'; > select distinct e.ident, e.ins > from work.check e, f.course c > where e.ident not in > (select c.ident > from f.course c) > order by e.ident; The general approach is correct and sufficient for Ray's problem. I think you could use simply select CHECK.* from WORK.CHECK where CHECK.IDENT not in ( select IDENT from F.COURSE) order by IDENT; to get the same result. In other words, it is not necessary to join E with C in order to reference table C in a subquery. The main purpose of my posting, however, is to point out that this method falls apart when you need to match on two keys instead of one. Suppose the IDENT's are repeated within different colleges, and you need to do a combined analysis. Now the observations are identified with {COLLEGE, IDENT}. The IN operator only works for a single variable, so can't handle this situation. This is a widespread weakness in the design of SQL: programs that work with single columns often cannot be extended to handle multiple columns. If you write the program to handle multiple keys, it will be able to handle the single-key situation. For a general solution which works with multiple keys, use select * from WORK.CHECK where not exists ( select * from F.COURSE where COLLEGE = CHECK.COLLEGE and IDENT = CHECK.IDENT) order by COLLEGE, IDENT; This is fairly intelligible but can be inefficient with the current optimizer, because the subquery is independently evaluated for each set of {COLLEGE, IDENT} values. Defining indexes on COLLEGE and IDENT (in the COURSE file) could help. It may be possible for the optimizer to automatically convert subqueries to joins in the future. This alternative may be more efficient (depending on factors such as how many distinct {COLLEGE, IDENT} sets there are). I haven't tested enough to determine exactly what the trade-offs are: select CHECK.* from WORK.CHECK, ( select COLLEGE, IDENT from WORK.CHECK except select COLLEGE, IDENT from F.COURSE) as NONDUP where CHECK.COLLEGE = NONDUP.COLLEGE and CHECK.IDENT = NONDUP.IDENT order by COLLEGE, IDENT; Unlike the earlier subquery solution, this program processes all of the different {COLLEGE, IDENT} pairs together. Andy