Join strategies

The most common join strategy in Derby is called a nested loop. For each qualifying row in the outer table, Derby uses the appropriate access path (index or table) to find the matching rows in the inner table.

Another type of join in Derby is called a hash join. For joins of this type, Derby constructs a hash table representing all the selected columns of the inner table. For each qualifying row in the outer table, Derby does a quick lookup on the hash table to get the inner table data. Derby has to scan the inner table or index only once, to build the hash table.

Nested loop joins are preferable in most situations.

Hash joins are preferable in situations in which the inner table values are unique and there are many qualifying rows in the outer table. Hash joins require that the statement's WHERE clause be an optimizable equijoin:

It must use the = operator to compare column(s) in the outer table to column(s) in the inner table.
References to columns in the inner table must be simple column references. Simple column references are described in Directly optimizable predicates.

The hash table for a hash join is held in memory, and if it gets big enough, it will spill to the disk. The optimizer makes a very rough estimate of the amount of memory required to make the hash table. If it estimates that the amount of memory required would exceed the system-wide limit of memory use for a table, the optimizer chooses a nested loop join instead.

If memory use is not a problem for your environment, set this property to a high number; allowing the optimizer the maximum flexibility in considering a join strategy queries involving large queries leads to better performance. It can also be set to smaller values for more limited environments.

Note: Derby allows multiple columns as hash keys.