Join order case study

As an example of how the optimizer chooses a join order, consider the following situation.

The Flights table (as you know) stores information about flight segments. It has a primary key on the flight_id and segment_number columns. This primary key constraint is backed up by a unique index on those columns.

The FlightAvailability table, which stores information about the availability of flight segments on particular days, can store several rows for a particular row in the Flights table (one for each date).

You want to see information about all the flights, and you issue the following query:

SELECT *
FROM FlightAvailability AS fa, Flights AS fts
WHERE fa.flight_id = fts.flight_id
AND fa.segment_number = fts.segment_number

First imagine the situation in which there are no useful indexes on the FlightAvailability table.

Using the join order with FlightAvailability as the outer table and Flights as the inner table is cheaper because it allows the flight_id/segment_number columns from FlightAvailability to be used to probe into and find matching rows in Flights, using the primary key index on Flights.flight_id and Flights.segment_number.

This is preferable to the opposite join order (with Flights as the outer table and FlightAvailability as the inner table) because in that case, for each row in Flights, the system would have to scan the entire FlightAvailability table to find the matching rows (because there is no useful index -- an index on the flight_id/segment_number columns).

Second, imagine the situation in which there is a useful index on the FlightAvailability table (this is actually the case in the sample database). FlightAvailability has a primary key index on flight_id, segment_number, and booking_date. In that index, the flight_id/segment_number combination is not unique, since there is a one-to-many correspondence between the Flights table and the FlightAvailability table. However, the index is still very useful for finding rows with particular flight_id/segment_number values.

You issue the same query:

SELECT *
FROM FlightAvailability AS fa, Flights AS fts
WHERE fa.flight_id = fts.flight_id
AND fa.segment_number = fts.segment_number

Although the difference in cost is smaller, it is still cheaper for the Flights table to be the inner table, because its index is unique, whereas FlightAvailability's index is not. That is because it is cheaper for Derby to step through a unique index than through a non-unique index.