The Derby
optimizer usually makes a good choice about join order. This section discusses
the performance implications of join order.
In a join operation involving two tables,
Derby scans the tables in a
particular order. Derby
accesses rows in one table first, and this table is now called the outer
table.
Then, for each qualifying row in the outer table,
Derby looks for matching rows
in the second table, which is called the inner table.
Derby accesses the outer
table once, and the inner table probably many times (depending on how many rows
in the outer table qualify).
This leads to a few general rules of thumb about join order:
- If the join has no restrictions in the WHERE clause that would limit the
number of rows returned from one of the tables to just a few, the following
rules apply:
- If only one table has an index on the joined column or columns, it is
much better for that table to be the inner table. This is because for each of
the many inner table lookups,
Derby can use an index
instead of scanning the entire table.
- Since indexes on inner tables are accessed many times, if the index on one
table is smaller than the index on another, the table with the smaller one
should probably be the inner table. That is because smaller indexes (or tables)
can be cached (kept in
Derby's memory, allowing
Derby to avoid expensive I/O
for each iteration).
- On the other hand, if a query has restrictions in the WHERE clause for one
table that would cause it to return only a few rows from that table (for
example, WHERE flight_id = 'AA1111'), it is better for the
restricted table to be the outer table.
Derby will have to go to the
inner table only a few times anyway.
Consider:
SELECT *
FROM huge_table, small_table
WHERE huge_table.unique_column = 1
AND huge_table.other_column = small_table.non_unique_column
In this case, the qualification huge_table.unique_column
= 1 (assuming a unique index on the column) qualifies only one row, so
it is better for huge_table to be the outer table in the
join.