home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.databases:6321 comp.databases.theory:389 comp.databases.oracle:1375
- Path: sparky!uunet!cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!uwm.edu!rutgers!uwvax!hellers
- From: hellers@wisc.edu (Joseph Hellerstein)
- Newsgroups: comp.databases,comp.databases.theory,comp.databases.oracle
- Subject: Re: Is this a bug or a limitation of the SQL language
- Message-ID: <HELLERS.92Aug25181305@cleo.wisc.edu>
- Date: 26 Aug 92 00:13:05 GMT
- References: <1992Aug23.074048.16681@prism.poly.edu> <1992Aug24.100941.24827@dk.oracle.com>
- <JOEY.92Aug24091348@elysium.berkeley.edu>
- <BtJKIG.2K5@watdragon.uwaterloo.ca>
- Sender: news@cs.wisc.edu (The News)
- Organization: UW-Madison
- Lines: 85
- In-Reply-To: gnpaulle@maytag.uwaterloo.ca's message of Tue, 25 Aug 1992 13:37:27 GMT
-
- In article <BtJKIG.2K5@watdragon.uwaterloo.ca> gnpaulle@maytag.uwaterloo.ca (Glenn Paulley) writes:
- > > [I wrote:]
- > > |select * from emp
- > > |where salary between
- > > |(select salary from emp where ename = 'Larry')
- > > |and
- > > |(select salary from emp where ename = 'John')
- >
- > > You want to avoid subqueries whenever possible. To get what you need,
- > > try the following:
- >
- > 2) Sahil's query, if optimized using subqueries for the where predicate,
- > will probably execute much faster than doing a three-way join (the
- > conjuncts in the where clause are single-value lookups; if ename is
- > indexed, for example, the entire result can be found using two indexed
- > retrievals, and a single table scan.... no joins, no sorts).
-
- This is incorrect. Subquery processing is identical to *one kind* of
- join processing, namely nested-loop with the subquery as the inner.
- Hence joins will always do at least as well as subqueries, and often
- better, since the space of paths for the optimizer to choose from is
- larger. To argue that subquery processing is simpler than join
- processing is just wrong. It's a kind of join processing itself, with
- restricted output semantics.
-
- > Avoiding
- > subqueries altogether, or always attempting to transform subqueries to
- > joins, is a dangerous heuristic.
-
- A heuristic, yes, as noted below (duplicate elimination may be
- required). Dangerous? Not really. For exactly the reason I
- outlined above, subquery-to-join is a stable conversion, meaning that
- it will almost always do as well, and often better. To do duplicate
- semantics correctly, a sort or hash is sometimes required after
- transformation (SELECT DISTINCT instead of SELECT), and this may
- result in some slowdown. But the win from conversion can result in
- orders-of-magnitude improvement, while the loss of an extra sort (say)
- usually slows things down by much less (in the < 100% slowdown range).
-
-
- > RDBMS optimizers should also determine
- > if transforming a join into a subquery will result in a better execution
- > plan.
- This can be done, but as noted above, this logic is not essential
- since subquery-to-join conversion is quite stable. Since optimization
- time is already dangerously complex, adding this dimension is probably
- not worth it.
-
- >
- > Having the *user* perform query rewrite for "optimization" is unacceptable
- > (it's what query optimizers in RDBMS are all about). Sahil's query is
- > straightforward and specifies exactly what he wants. It's not his fault
- > if today's query optimizers fail to execute this query in the most
- > optimal way.
-
- Absolutely 100% agreed. This is a serious failing in the existing SQL
- products, namely that different expressions of the same query may have
- wildly different performance. As a result, if you open up a vendor's
- manual, you will usually find such "tips" as: "convert subqueries to
- joins in the case where blah blah blah". Naturally these tips will
- stymie all but the most sophisticated users. And every shop needs
- such a user, or they're in trouble. The Starburst research project at
- IBM Almaden may result in IBM's products having such optimizations be
- automatic some years from now.
-
- > It's also a dangerous practice
- > to generalize (that is, transform subqueries to joins) because not all
- > transformations result in equivalent queries (particularly when the
- > database contains null values). See the paper by Negri et al.
- > in ACM TODS, Vol 17, No. 3, Sept. 1991 for formal semantics of entry-level
- > SQL and query equivalence classes.
-
- And see the paper by Pirahesh, Hellerstein, and Hasan in SIGMOD '92
- which demonstrates that subquery-to-join *can* be done correctly (and
- is done automatically in Starburst) for all existential subqueries
- that are Boolean Factors (i.e. IN, ANY, or EXISTS subqueries that are
- outer conjuncts of the WHERE clause, not nested under NOT or OR, etc.)
-
- You have to be careful, but the rewrite I gave him will work
- correctly, may well speed up his query, and may even sneak by his
- limited parser.
-
- Happy rewriting.
-
- Joe Hellerstein
-