When
Derby creates statistics for a table's index, it calculates
and stores in the system tables:
- The number of rows in the table
- The number of unique values for a set of columns for leading columns in
an index key, also known as cardinality. Leading columns
refers to the first column, or the first and second column, or the first,
second, and third column of an index (and so on). Derby cannot compute
the number of columns for which a combination of the non-leading columns is
unique.
For example, consider the primary key on the table FlightAvailability:
CONSTRAINT FLIGHTAVAILABILITY_PK Primary Key (
FLIGHT_ID,
SEGMENT_NUMBER,
FLIGHT_DATE)
For this index,
Derby keeps the following information:
- The number of rows in the table FlightAvailability
- The number of unique rows for the full key (flight_id, segment_number, flight_date)
- The number of unique rows for the key (flight_id, segment_number)
- The number of unique rows for the key (flight_id)
How does
Derby use these two numbers-the number of rows in
a table and the cardinality of a particular key-to determine the selectivity
of a query? Take this example:
SELECT * FROM Flights, FlightAvailability
WHERE Flights.flight_id = OtherTable.flight_id
If the cardinality for flight_id in
Flights is
250, then the selectivity of the predicate is 1/250. The optimizer would estimate
the number of rows read to be:
((Rows in Flights) * (Rows in OtherTable))/250