Flattening a subquery into a normal join

Subqueries are allowed to return more than one row when used with IN, EXISTS, and ANY. However, for each row returned in the outer row, Derby evaluates the subquery until it returns one row; it does not evaluate the subquery for all rows returned.

For example, given two tables, t1 and t2:

c1
--
 1
 2
 3
 
c1
--
 2
 2
 1

and the following query:

SELECT c1 FROM t1 WHERE c1 IN (SELECT c1 FROM t2)

the results would be

1
2

Simply selecting t1.c1 when simply joining those tables has different results:

SELECT t1.c1 FROM t1, t2 WHERE t1.c1 = t2.c1
   1
   2
   2

Statements that include such subqueries can be flattened into joins only if the subquery does not introduce any duplicates into the result set (in our example, the subquery introduced a duplicate and so cannot simply be flattened into a join). If this requirement and other requirements (listed below) are met, however, the statement is flattened such that the tables in the subquery's FROM list are treated as if they were inner to the tables in the outer FROM list.

For example, the query could have been flattened into a join if c1 in t2 had a unique index on it. It would not have introduced any duplicate values into the result set.

The requirements for flattening into a normal join are as follows:

Flattening into a normal join gives the optimizer more options for choosing the best query plan. For example, if the following statement:

SELECT huge.* FROM huge
WHERE c1 IN (SELECT c1 FROM tiny)

can be flattened into

SELECT huge.* FROM huge, tiny
WHERE huge.c1 = tiny.c1

the optimizer can choose a query plan that will scan tiny and do a few probes into the huge table instead of scanning the huge table and doing a large number of probes into the tiny table.

Here is an expansion of the example used earlier in this section. Given

CREATE TABLE t1 (c1 INT)
CREATE TABLE t2 (c1 INT NOT NULL PRIMARY KEY)
CREATE TABLE t3 (c1 INT NOT NULL PRIMARY KEY)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (1), (2), (3)
INSERT INTO t3 VALUES (2), (3), (4)

this query

SELECT t1.* FROM t1 WHERE t1.c1 IN 
    (SELECT t2.c1 FROM t2, t3 WHERE t2.c1 = t3.c1)

should return the following results:

2
3

The query satisfies all the requirements for flattening into a join, and the statement can be transformed into the following one:

SELECT t1.*
FROM t1, t2, t3
WHERE t1.c1 = t2.c1
AND t2.c1 = t3.c1
AND t1.c1 = t3.c1

The following query:

SELECT t1.*
FROM t1 WHERE EXISTS
(SELECT * FROM t2, t3 WHERE t2.c1 = t3.c1 AND t2.c1 = t1.c1)

can be transformed into

SELECT t1.*
FROM t1, t2, t3
WHERE t1.c1 = t2.c1
AND t2.c1 = t3.c1
AND t1.c1 = t3.c1
Related concepts
Materialization
Flattening a subquery into an EXISTS join
Flattening VALUES subqueries
DISTINCT elimination in IN, ANY, and EXISTS subqueries
IN/ANY subquery transformation