NetNews Usenet Archive 1992 #19

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #19 / NN_1992_19.iso / spool / bit / listserv / sasl / 4028 < prev next >

Wrap

Text File | 1992-09-03 | 4.7 KB | 123 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!uvaarpa!darwin.sura.net!paladin.american.edu!auvm!LEICESTER.AC.UK!MAIL_SYSTEM Via: uk.ac.leicester; Thu, 3 Sep 1992 15:58:27 +0100 Message-ID: <SAS-L%92090311032958@VTVM2.BITNET> Newsgroups: bit.listserv.sas-l Date: Thu, 3 Sep 1992 15:58:00 BST Reply-To: Mail_System@LEICESTER.AC.UK Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET> From: Mail_System@LEICESTER.AC.UK Subject: %% Undelivered Mail %% Comments: To: SAS-L <SAS-L@VTVM2.CC.VT.EDU> Lines: 109 Your mail was not delivered as follows: %MAIL-E-USERSPEC, invalid user specification '@UK.AC.LEICESTER.IRIX' Your original mail header and message follow. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Via: UK.AC.NSFNET-RELAY; Thu, 3 Sep 92 15:57 BST Received: from vtvm2.cc.vt.edu by sun3.nsfnet-relay.ac.uk with Internet SMTP id <sg.24811-0@sun3.nsfnet-relay.ac.uk>; Thu, 3 Sep 1992 15:57:11 +0100 Received: from vtvm2.cc.vt.edu by VTVM2.CC.VT.EDU (IBM VM SMTP V2R2) with BSMTP id 9451; Thu, 03 Sep 92 10:56:11 EDT Received: from VTVM2.BITNET by vtvm2.cc.vt.edu (Mailer R2.08 R208002) with BSMTP id 0631; Thu, 03 Sep 92 10:56:09 EDT Date: Thu, 3 Sep 1992 10:48:34 EDT Reply-To: Andy Norton <76350.1604@COM.COMPUSERVE> Original-Sender: "SAS(r) Discussion" <SAS-L@EDU.VT.CC.VTVM2> From: Andy Norton <76350.1604@COM.COMPUSERVE> Subject: Re: Finding non-dups via SQL (multiple keys) Comments: To: SAS-L <SAS-L@AWIIMC12.IMC.UNIVIE.AC.AT> To: Multiple recipients of list SAS-L <SAS-L@EDU.VT.CC.VTVM2> Sender: SAS-L@EDU.VT.CC.VTVM2 ---------------------------------------------------------------------- CONTENT: Information SUMMARY: Jim Clark's response to Ray Pass is correct for single keys. Here are solutions for multiple keys. REL/PLTF: 6.07.01/CMS, 6.04/PC-DOS E-ADDR: 76350.1604@compuserve.com NAME: Andy Norton ADDRESS: Trilogy Consulting, 5228 Lovers Lane, Kalamazoo MI 49002 PHONE: (616) 344-2191 ---------------------------------------------------------------------- Ray Pass had asked for a method to identify CASENO's in FILE1 that do not occur in FILE2. Jim Clark responded with > title 'Eval Courses NOT IN Class Courses'; > select distinct e.ident, e.ins > from work.check e, f.course c > where e.ident not in > (select c.ident > from f.course c) > order by e.ident; The general approach is correct and sufficient for Ray's problem. I think you could use simply select CHECK.* from WORK.CHECK where CHECK.IDENT not in ( select IDENT from F.COURSE) order by IDENT; to get the same result. In other words, it is not necessary to join E with C in order to reference table C in a subquery. The main purpose of my posting, however, is to point out that this method falls apart when you need to match on two keys instead of one. Suppose the IDENT's are repeated within different colleges, and you need to do a combined analysis. Now the observations are identified with {COLLEGE, IDENT}. The IN operator only works for a single variable, so can't handle this situation. This is a widespread weakness in the design of SQL: programs that work with single columns often cannot be extended to handle multiple columns. If you write the program to handle multiple keys, it will be able to handle the single-key situation. For a general solution which works with multiple keys, use select * from WORK.CHECK where not exists ( select * from F.COURSE where COLLEGE = CHECK.COLLEGE and IDENT = CHECK.IDENT) order by COLLEGE, IDENT; This is fairly intelligible but can be inefficient with the current optimizer, because the subquery is independently evaluated for each set of {COLLEGE, IDENT} values. Defining indexes on COLLEGE and IDENT (in the COURSE file) could help. It may be possible for the optimizer to automatically convert subqueries to joins in the future. This alternative may be more efficient (depending on factors such as how many distinct {COLLEGE, IDENT} sets there are). I haven't tested enough to determine exactly what the trade-offs are: select CHECK.* from WORK.CHECK, ( select COLLEGE, IDENT from WORK.CHECK except select COLLEGE, IDENT from F.COURSE) as NONDUP where CHECK.COLLEGE = NONDUP.COLLEGE and CHECK.IDENT = NONDUP.IDENT order by COLLEGE, IDENT; Unlike the earlier subquery solution, this program processes all of the different {COLLEGE, IDENT} pairs together. Andy %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% End of returned mail