apache > db
Apache DB Project
 
Font size:      

Flattening a Subquery into a Normal Join

Flattening a Subquery into a Normal Join

Subqueries are allowed to return more than one row when used with IN, EXISTS, and ANY. However, for each row returned in the outer row, Derby evaluates the subquery until it returns one row; it does not evaluate the subquery for all rows returned.

For example, given two tables, t1 and t2:

c1
1
2
3


c1
2
2
1

and the following query:

SELECT c1 FROM t1 WHERE c1 IN (SELECT c1 FROM t2)

the results would be

1
2

Simply selecting t1.c1 when simply joining those tables has different results:

SELECT t1.c1 FROM t1, t2 WHERE t1.c1 = t2.c1
   1
   2
   2

Statements that include such subqueries can be flattened into joins only if the subquery does not introduce any duplicates into the result set (in our example, the subquery introduced a duplicate and so cannot simply be flattened into a join). If this requirement and other requirements (listed below) are met, however, the statement is flattened such that the tables in the subquery's FROM list are treated as if they were inner to the tables in the outer FROM list.

For example, the query could have been flattened into a join if c1 in t2 had a unique index on it. It would not have introduced any duplicate values into the result set.

The requirements for flattening into a normal join are:

  • The subquery is not under an OR.
  • The subquery type is EXISTS, IN, or ANY, or it is an expression subquery on the right side of a comparison operator.
  • The subquery is not in the SELECT list of the outer query block.
  • There are no aggregates in the SELECT list of the subquery.
  • The subquery does not have a GROUP BY clause.
  • There is a uniqueness condition that ensures that the subquery does not introduce any duplicates if it is flattened into the outer query block.
  • Each table in the subquery's FROM list (after any view, derived table, or subquery flattening) must be a base table.
  • If there is a WHERE clause in the subquery, there is at least one table in the subquery whose columns are in equality predicates with expressions that do not include any column references from the subquery block. These columns must be a superset of the key columns for any unique index on the table. For all other tables in the subquery, the columns in equality predicates with expressions that do not include columns from the same table are a superset of the unique columns for any unique index on the table.

Flattening into a normal join gives the optimizer more options for choosing the best query plan. For example, if the following statement:

SELECT huge.* FROM huge
WHERE c1 IN (SELECT c1 FROM tiny)

can be flattened into

SELECT huge.* FROM huge, tiny WHERE huge.c1 = tiny.c1

the optimizer can choose a query plan that will scan tiny and do a few probes into the huge table instead of scanning the huge table and doing a large number of probes into the tiny table.

Here is an expansion of the example used earlier in this section. Given

CREATE TABLE t1 (c1 INT)
CREATE TABLE t2 (c1 INT NOT NULL PRIMARY KEY)
CREATE TABLE t3 (c1 INT NOT NULL PRIMARY KEY)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (1), (2), (3)
INSERT INTO t3 VALUES (2), (3), (4)

this query

SELECT t1.* FROM t1 WHERE t1.c1 IN 
    (SELECT t2.c1 FROM t2, t3 WHERE t2.c1 = t3.c1)

should return the following results:

2
3

The query satisfies all the requirements for flattening into a join, and the statement can be transformed into the following one:

SELECT t1.*
FROM t1, t2, t3
WHERE t1.c1 = t2.c1
AND t2.c1 = t3.c1
AND t1.c1 = t3.c1

The following query:

SELECT t1.*
FROM t1 WHERE EXISTS
(SELECT * FROM t2, t3 WHERE t2.c1 = t3.c1 AND t2.c1 = t1.c1)

can be transformed into

SELECT t1.*
FROM t1, t2, t3
WHERE t1.c1 = t2.c1
AND t2.c1 = t3.c1
AND t1.c1 = t3.c1


Previous Page
Next Page
Table of Contents
Index