home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!uvaarpa!darwin.sura.net!paladin.american.edu!auvm!LEICESTER.AC.UK!MAIL_SYSTEM
- Via: uk.ac.leicester; Thu, 3 Sep 1992 16:05:51 +0100
- Message-ID: <SAS-L%92090311074694@VTVM2.BITNET>
- Newsgroups: bit.listserv.sas-l
- Date: Thu, 3 Sep 1992 16:05:00 BST
- Reply-To: Mail_System@LEICESTER.AC.UK
- Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET>
- From: Mail_System@LEICESTER.AC.UK
- Subject: %% Undelivered Mail %%
- Comments: To: SAS-L <SAS-L@VTVM2.CC.VT.EDU>
- Lines: 135
-
- Your mail was not delivered as follows:
- %MAIL-E-USERSPEC, invalid user specification '@UK.AC.LEICESTER.IRIX'
-
- Your original mail header and message follow.
-
- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- Via: UK.AC.NSFNET-RELAY; Thu, 3 Sep 92 16:05 BST
- Received: from vtvm2.cc.vt.edu by sun3.nsfnet-relay.ac.uk with Internet SMTP
- id <sg.25320-0@sun3.nsfnet-relay.ac.uk>;
- Thu, 3 Sep 1992 16:05:12 +0100
- Received: from vtvm2.cc.vt.edu by VTVM2.CC.VT.EDU (IBM VM SMTP V2R2) with BSMTP
- id 9502; Thu, 03 Sep 92 11:04:15 EDT
- Received: from VTVM2.BITNET by vtvm2.cc.vt.edu (Mailer R2.08 R208002)
- with BSMTP id 1170; Thu, 03 Sep 92 11:03:37 EDT
- Date: Thu, 3 Sep 1992 15:58:00 BST
- Reply-To: Mail_System@UK.AC.LEICESTER
- Original-Sender: "SAS(r) Discussion" <SAS-L@EDU.VT.CC.VTVM2>
- From: Mail_System@UK.AC.LEICESTER
- Subject: %% Undelivered Mail %%
- Comments: To: SAS-L <SAS-L@VTVM2.CC.VT.EDU>
- To: Multiple recipients of list SAS-L <SAS-L@EDU.VT.CC.VTVM2>
- Sender: SAS-L@EDU.VT.CC.VTVM2
-
- Your mail was not delivered as follows:
- %MAIL-E-USERSPEC, invalid user specification '@UK.AC.LEICESTER.IRIX'
-
- Your original mail header and message follow.
-
- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- Via: UK.AC.NSFNET-RELAY; Thu, 3 Sep 92 15:57 BST
- Received: from vtvm2.cc.vt.edu by sun3.nsfnet-relay.ac.uk with Internet SMTP
- id <sg.24811-0@sun3.nsfnet-relay.ac.uk>;
- Thu, 3 Sep 1992 15:57:11 +0100
- Received: from vtvm2.cc.vt.edu by VTVM2.CC.VT.EDU (IBM VM SMTP V2R2) with BSMTP
- id 9451; Thu, 03 Sep 92 10:56:11 EDT
- Received: from VTVM2.BITNET by vtvm2.cc.vt.edu (Mailer R2.08 R208002)
- with BSMTP id 0631; Thu, 03 Sep 92 10:56:09 EDT
- Date: Thu, 3 Sep 1992 10:48:34 EDT
- Reply-To: Andy Norton <76350.1604@COM.COMPUSERVE>
- Original-Sender: "SAS(r) Discussion" <SAS-L@EDU.VT.CC.VTVM2>
- From: Andy Norton <76350.1604@COM.COMPUSERVE>
- Subject: Re: Finding non-dups via SQL (multiple keys)
- Comments: To: SAS-L <SAS-L@AWIIMC12.IMC.UNIVIE.AC.AT>
- To: Multiple recipients of list SAS-L <SAS-L@EDU.VT.CC.VTVM2>
- Sender: SAS-L@EDU.VT.CC.VTVM2
-
- ----------------------------------------------------------------------
- CONTENT: Information
- SUMMARY: Jim Clark's response to Ray Pass is correct for single
- keys. Here are solutions for multiple keys.
- REL/PLTF: 6.07.01/CMS, 6.04/PC-DOS
- E-ADDR: 76350.1604@compuserve.com
- NAME: Andy Norton
- ADDRESS: Trilogy Consulting, 5228 Lovers Lane, Kalamazoo MI 49002
- PHONE: (616) 344-2191
- ----------------------------------------------------------------------
-
- Ray Pass had asked for a method to identify CASENO's in FILE1 that do
- not occur in FILE2.
-
- Jim Clark responded with
- > title 'Eval Courses NOT IN Class Courses';
- > select distinct e.ident, e.ins
- > from work.check e, f.course c
- > where e.ident not in
- > (select c.ident
- > from f.course c)
- > order by e.ident;
-
- The general approach is correct and sufficient for Ray's problem. I
- think you could use simply
- select CHECK.*
- from WORK.CHECK
- where CHECK.IDENT not in
- ( select IDENT
- from F.COURSE)
- order by IDENT;
- to get the same result. In other words, it is not necessary to join E
- with C in order to reference table C in a subquery.
-
- The main purpose of my posting, however, is to point out that this
- method falls apart when you need to match on two keys instead of one.
- Suppose the IDENT's are repeated within different colleges, and you
- need to do a combined analysis. Now the observations are identified
- with {COLLEGE, IDENT}. The IN operator only works for a single
- variable, so can't handle this situation.
-
- This is a widespread weakness in the design of SQL: programs that work
- with single columns often cannot be extended to handle multiple
- columns. If you write the program to handle multiple keys, it will be
- able to handle the single-key situation.
-
- For a general solution which works with multiple keys, use
- select *
- from WORK.CHECK
- where not exists
- ( select *
- from F.COURSE
- where COLLEGE = CHECK.COLLEGE and
- IDENT = CHECK.IDENT)
- order by COLLEGE, IDENT;
-
- This is fairly intelligible but can be inefficient with the current
- optimizer, because the subquery is independently evaluated for each set
- of {COLLEGE, IDENT} values. Defining indexes on COLLEGE and IDENT (in
- the COURSE file) could help.
-
- It may be possible for the optimizer to automatically convert
- subqueries to joins in the future.
-
- This alternative may be more efficient (depending on factors such as
- how many distinct {COLLEGE, IDENT} sets there are). I haven't tested
- enough to determine exactly what the trade-offs are:
-
- select CHECK.*
- from WORK.CHECK,
- ( select COLLEGE, IDENT
- from WORK.CHECK
- except
- select COLLEGE, IDENT
- from F.COURSE) as NONDUP
- where CHECK.COLLEGE = NONDUP.COLLEGE and
- CHECK.IDENT = NONDUP.IDENT
- order by COLLEGE, IDENT;
-
- Unlike the earlier subquery solution, this program processes all of the
- different {COLLEGE, IDENT} pairs together.
-
- Andy
-
- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- End of returned mail
-
- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- End of returned mail
-