What are cardinality statistics?

When Derby creates statistics for a table's index, it calculates and stores the following in the system tables.

The number of rows in the table
The number of unique values for a set of columns for leading columns in an index key, also known as cardinality. Leading columns refers to the first column, or the first and second column, or the first, second, and third column of an index (and so on). Derby cannot compute the number of columns for which a combination of the non-leading columns is unique.

For example, consider the primary key on the table FlightAvailability:

CONSTRAINT FLIGHTAVAILABILITY_PK Primary Key (
      FLIGHT_ID,
      SEGMENT_NUMBER,
      FLIGHT_DATE)

For this index, Derby keeps the following information:

The number of rows in the table FlightAvailability
The number of unique rows for the full key (flight_id, segment_number, flight_date)
The number of unique rows for the key (flight_id, segment_number)
The number of unique rows for the key (flight_id)

How does Derby use these two numbers -- the number of rows in a table and the cardinality of a particular key -- to determine the selectivity of a query? Take this example:

SELECT * FROM Flights, FlightAvailability
WHERE Flights.flight_id = OtherTable.flight_id

If the cardinality for flight_id in Flights is 250, the selectivity of the predicate is 1/250. The optimizer would estimate the number of rows read to be as follows:

((Rows in Flights) * (Rows in OtherTable))/250