apache > db
Apache DB Project
 
Font size:      

Index use and access paths

Index use and access paths

If you define an index on a column or columns, the query optimizer can use the index to find data in the column more quickly. Derby automatically creates indexes to back up primary key, foreign key, and unique constraints, so those indexes are always available to the optimizer, as well as those that you explicitly create with the CREATE INDEX command. The way Derby gets to the data--via an index or directly via the table--is called the access path.

This section covers the following topics:

What Is an Index?

An index is a database structure that provides quick lookup of data in a column or columns of a table. It is probably best described through examples.

For example, the Flights table in the toursDB database has three indexes:

  • an index on the orig_airport column (called OrigIndex)
  • an index on the dest_airport column (called DestIndex)
  • an index enforcing the primary key constraint on the flight_id and segment_number columns (which has a system-generated name)

This means there are three separate structures that provide shortcuts into the Flights table. Let's look at one of those structures, OrigIndex.

OrigIndex stores every value in the orig_airport column, plus information on how to retrieve the entire corresponding row for each value.

  • For every row in Flights, there is an entry in OrigIndex that includes the value of the orig_airport column and the address of the row itself. The entries are stored in ascending order by the orig_airport values.

When an index includes more than one column, the first column is the main one by which the entries are ordered. For example, the index on (flight_id, segment_number) is ordered first by flight_id. If there is more than one flight_id of the same value, those entries are then ordered by segment_number. An excerpt from the entries in the index might look like this:

'AA1111' 1
'AA1111' 2
'AA1112' 1
'AA1113' 1
'AA1113' 2

Indexes are helpful only sometimes. This particular index is useful when a statement's WHERE clause is looking for rows for which the value of orig_airport is some specific value or range of values. SELECTs, UPDATEs, and DELETEs can all have WHERE clauses.

For example, OrigIndex is helpful for statements such as the following:

SELECT *
FROM Flights
WHERE orig_airport = 'SFO'
 
 
SELECT *
FROM Flights
WHERE orig_airport < 'BBB'
 
 
SELECT *
FROM Flights
WHERE orig_airport >= 'MMM'

DestIndex is helpful for statements such as the following:

SELECT *
FROM Flights
WHERE dest_airport = 'SCL'

The primary key index (on flight_id and segment_number) is helpful for statements such as the following:

SELECT *
FROM Flights
WHERE flight_id = 'AA1111'
 
 
SELECT *
FROM Flights
WHERE flight_id BETWEEN 'AA1111' AND 'AA1115'
 
 
SELECT *
FROM FlightAvailability AS fa, Flights AS fts
WHERE flight_date > CURRENT_DATE
AND fts.flight_id = fa.flight_id
AND fts.segment_number = fa.segment_number

The next section discusses why the indexes are helpful for these statements but not for others.

What's Optimizable?

As you learned in the previous section, Derby might be able to use an index on a column to find data more quickly. If Derby can use an index for a statement, that statement is said to be optimizable. The statements shown in the preceding section allow Derby to use the index because their WHERE clauses provide start and stop conditions. That is, they tell Derby the point at which to begin its scan of the index and where to end the scan.

For example, a statement with a WHERE clause looking for rows for which the orig_airport value is less than BBB means that Derby must begin the scan at the beginning of the index; it can end the scan at BBB. This means that it avoids scanning the index for most of the entries.

An index scan that uses start or stop conditions is called a matching index scan.

Note:
A WHERE clause can have more than one part. Parts are linked with the word AND or OR. Each part is called a predicate. WHERE clauses with predicates joined by OR are not optimizable. WHERE clauses with predicates joined by AND are optimizable if at least one of the predicates is optimizable. For example:
SELECT * FROM Flights
WHERE flight_id = 'AA1111' AND
segment_number <> 2

In this example, the first predicate is optimizable; the second predicate is not. Therefore, the statement is optimizable.

Note:
In a few cases, a WHERE clause with predicates joined by OR can be transformed into an optimizable statement. See OR Transformations.

Directly Optimizable Predicates

Some predicates provide clear-cut starting and stopping points. A predicate provides start or stop conditions, and is therefore optimizable, when:

  • It uses a simple column reference to a column (the name of the column, not the name of the column within an expression or method call). For example, the following is a simple column reference:
    WHERE orig_airport = 'SFO'
    
    

    The following is not:

    WHERE lower(orig_airport) = 'sfo'
    
    
  • It refers to a column that is the first or only column in the index.

    References to contiguous columns in other predicates in the statement when there is a multi-column index can further define the starting or stopping points. (If the columns are not contiguous with the first column, they are not optimizable predicates but can be used as qualifiers.) For example, given a composite index on FlightAvailability (flight_id, segment_number, and flight_date), the following predicate satisfies that condition:

    WHERE flight_id = 'AA1200' AND segment_number = 2
    
    

    The following one does not:

    WHERE flight_id = 'AA1200' AND flight_date = CURRENT_DATE
    
    
  • The column is compared to a constant or to an expression that does not include columns in the same table. Examples of valid expressions: other_table.column_a, ? (dynamic parameter), 7+9. The comparison must use the following operators:
    • =
    • <
    • <=
    • >
    • >=
    • IS NULL

Indirectly Optimizable Predicates

Some predicates are transformed internally into ones that provide starting and stopping points and are therefore optimizable.

Predicates that use the following comparison operators can be transformed internally into optimizable predicates:

  • BETWEEN
  • LIKE (in certain situations)
  • IN (in certain situations)

For details on these and other transformations, see Appendix A, Internal Language Transformations.

Joins

Joins specified by the JOIN keyword are optimizable. This means that Derby can use an index on the inner table of the join (start and stop conditions are being supplied implicitly by the rows in the outer table).

Note that joins built using traditional predicates are also optimizable. For example, the following statement is optimizable:

SELECT * FROM Countries, Cities
WHERE Countries.country_ISO_code = Cities.country_ISO_code

Covering Indexes

Even when there is no definite starting or stopping point for an index scan, an index can speed up the execution of a query if the index covers the query. An index covers the query if all the columns specified in the query are part of the index. These are the columns that are all columns referenced in the query, not just columns in a WHERE clause. If so, Derby never has to go to the data pages at all, but can retrieve all data through index access alone. For example, in the following queries, OrigIndex covers the query:

SELECT orig_airport
FROM Flights
 
 
SELECT DISTINCT lower(orig_airport) FROM Flights 
FROM Flights

Derby can get all required data out of the index instead of from the table.

Single-Column Index Examples

The following queries do not provide start and stop conditions for a scan of OrigIndex, the index on the orig_airport column in Flights. However, some of these queries allow Derby to do an index rather than a table scan because the index covers the query.

-- Derby  would scan entire table; comparison is not with a 
-- constant or with a column in another table 
SELECT *
FROM Flights
WHERE orig_airport = dest_airport
 -- Derby  would scan entire table; <> operator is not optimizable 
SELECT *
FROM Flights
WHERE orig_airport <> 'SFO'
 -- not valid operator for matching index scan
-- However, Derby  would do an index 
-- rather than a table scan because
-- index covers query 
SELECT orig_airport
FROM Flights
WHERE orig_airport <> 'SFO'
 -- use of a function is not simple column reference
-- Derby  would scan entire index, but not table
-- (index covers query) 
SELECT orig_airport
FROM Flights
WHERE lower(orig_airport) = 'sfo'

Multiple-Column Index Example

The following queries do provide start and stop conditions for a scan of the primary key index on the flight_id and segment_number columns in Flights:

-- the where clause compares both columns with valid
-- operators to constants 
SELECT *
FROM Flights
WHERE flight_id = 'AA1115'
AND segment_number < 2
 -- the first column is in a valid comparison 
SELECT *
FROM Flights
WHERE flight_id < 'BB'
 -- LIKE is transformed into >= and <=, providing
-- start and stop conditions 
 
SELECT *
FROM Flights
WHERE flight_id LIKE 'AA%'

The following queries do not:

-- segment_number is in the index, but it's not the first column;
-- there's no logical starting and stopping place 
SELECT *
FROM Flights
WHERE segment_number = 2
 -- Derby  would scan entire table; comparison of first column
-- is not with a constant or column in another table
-- and no covering index applies 
SELECT *
FROM Flights
WHERE orig_airport = dest_airport
AND segment_number < 2

Useful Indexes Can Use Qualifiers

Matching index scans can use qualifiers that further restrict the result set. Remember that a WHERE clause that contains at least one optimizable predicate is optimizable. Nonoptimizable predicates can be useful in other ways.

Consider the following query:

SELECT *
FROM FLIGHTS
WHERE orig_airport < 'BBB'
AND orig_airport <> 'AKL'

The second predicate is not optimizable, but the first predicate is. The second predicate becomes a qualification for which Derby evaluates the entries in the index as it traverses it.

  • The following comparisons are valid qualifiers:
    • =
    • <
    • <=
    • >
    • >=
    • IS NULL
    • BETWEEN
    • LIKE
    • <>
    • IS NOT NULL
  • The qualifier's reference to the column does not have to be a simple column reference; you can put the column in an expression.
  • The qualifier's column does not have to be the first column in the index and does not have to be contiguous with the first column in the index.

When a Table Scan Is Better

Sometimes a table scan is the most efficient way to access data, even if a potentially useful index is available. For example, if the statement returns virtually all the data in the table, it is more efficient to go straight to the table instead of looking values up in an index, because then Derby is able to avoid the intermediate step of retrieving the rows from the index lookup values.

For example:

SELECT *
FROM Flights
WHERE dest_airport < 'Z'

In the Flights table, most of the airport codes begin with letters that are less than Z. Depending on the number of rows in the table, it is probably more efficient for Derby to go straight to the table to retrieve the appropriate rows. However, for the following query, Derby uses the index:

SELECT *
FROM Flights
WHERE dest_airport < 'B'

Only a few flights have airport codes that begin with a letter less than B.

Indexes Have a Cost for Inserts, Updates, and Deletes

Derby has to do work to maintain indexes. If you insert into or delete from a table, the system has to insert or delete rows in all the indexes on the table. If you update a table, the system has to maintain those indexes that are on the columns being updated. So having a lot of indexes can speed up select statements, but slow down inserts, updates, and deletes.

Note:
Updates and deletes with WHERE clauses can use indexes for scans, even if the indexed column is being updated.

Previous Page
Next Page
Table of Contents
Index