apache > db
Apache DB Project
 
Font size:      

Non-Cost-Based Sort Avoidance (Tuple Filtering)

Non-Cost-Based Sort Avoidance (Tuple Filtering)

In most cases, Derby needs to perform two separate steps for statements that use DISTINCT or GROUP BY: first sorting the selected columns, then either discarding duplicate rows or aggregating grouped rows. Sometimes it is able to avoid sorting for these statements with tuple filtering. Tuple filtering means that the rows are already in a useful order. For DISTINCT, Derby can simply filter out duplicate values when they are found and return results to the user sooner. For GROUP BY, Derby can aggregate a group of rows until a new set of rows is detected and return results to the user sooner.

These are non-cost-based optimizations; the optimizer does not yet consider the cost of these optimizations.

The examples in this section refer to the following tables:

CREATE TABLE t1(c1 INT, c2 INT, c3 INT, c4 INT)
CREATE INDEX i1 ON t1(c1)
CREATE INDEX i1_2_3_4 ON t1(c1, c2, c3, c4)

DISTINCT

Tuple filtering is applied for a DISTINCT when the following criteria are met:

  • The SELECT list is composed entirely of simple column references and constants.
  • All simple column references come from the same table and the optimizer has chosen the table in question to be the outermost table in the query block.
  • The optimizer has chosen an index as the access path for the table in question.
  • The simple column references in the SELECT list, plus any simple column references from the table that have equality predicates on them, are a prefix of the index that the optimizer selected as the access path for the table.
Note:
The set of column references must be an in-order prefix of the index.

Here is the most common case in which tuple filtering will be applied:

SELECT DISTINCT c1 FROM t1

Equality predicates allow tuple filtering on the following:

SELECT DISTINCT c2
FROM t1
WHERE c1 = 5
 
SELECT DISTINCT c2, c4
FROM t1
WHERE c1 = 5 and c3 = 7
 -- the columns don't have to be in the
-- same order as the index 
SELECT DISTINCT c2, c1
FROM t1

Quick DISTINCT Scans

Derby can use a hash table instead of a sorter to eliminate duplicates when performing a DISTINCT in the following cases:

  • There is a single table in the query block.
  • An ORDER BY clause is not merged into the DISTINCT.
  • All entries in the SELECT list are simple column references.
  • There are no predicates in the query block.

This technique allows for minimal locking when performing the scan at the READ COMMITTED isolation level.

Note:
This technique appears in RunTimeStatistics as a DistinctScanResultSet.

GROUP BY

Tuple filtering is applied for a GROUP BY when the following criteria are met:

  • All grouping columns come from the same table and the optimizer has chosen the table in question to be the outermost table in the query block.
  • The optimizer has chosen an index as the access path for the table in question.
  • The grouping columns, plus any simple column references from the table that have equality predicates on them, are a prefix of the index that the optimizer selected as the access path for the table.

Here is the most common case in which tuple filtering will be applied:

SELECT max(c2) FROM t1 GROUP BY c1

Equality predicates allow tuple filtering on the following:

SELECT c2, SUM(c3)
FROM t1
WHERE c1 = 5 GROUP BY c2
 
SELECT max(c4)
FROM t1
WHERE c1 = 5 AND c3 = 6 GROUP BY c2


Previous Page
Next Page
Table of Contents
Index