Writing restricted table functions

Restricted table functions are Derby-style table functions which perform more efficiently because they can be told in advance which columns they will be asked to fetch along with simple limits on those columns. This feature exploits the expressiveness of the Java programming language and does not require any extensions to SQL.

A table function returns a rectangular chunk of data. If you use a restricted table function, Derby can tell the table function to return a shorter and narrower rectangle.

Consider the following scan of a table in a foreign database:

    select id, firstName, lastName
    from table( foreignDatabaseEmployeeTable() ) s
    where lastName = 'Stone'

If foreignDatabaseEmployeeTable is a restricted table function, Derby can tell the table function to fetch only the id, firstName, and lastName columns. In addition, Derby can tell the table function that it does not need to scan the entire foreign table; instead, the table function only needs to retrieve information for employees whose last name is "Stone".

Depending on the table function and query, this feature can support 1000X, 1000000X, or even greater performance improvements.

How to use restricted table functions

Creating and using a restricted table function involves the following steps:

  1. Implement - You must write a class which implements both java.sql.ResultSet and the Derby-specific interface org.apache.derby.vti.RestrictedVTI. This interface defines an initScan() method. When executing a query, Derby uses that method to tell the table function what columns it will have to fetch and what bounds should be applied to those columns in order to reduce the number of rows returned. For the rest of this discussion, this user-written class will be referred to as MyVTIClass.
  2. Publish - You must publish the table function by creating a public static method which returns a MyVTIClass. This is important. The Derby compiler must be able to see that the table function returns an object which implements both java.sql.ResultSet and org.apache.derby.vti.RestrictedVTI.
  3. Declare - You declare the table function to Derby using the same CREATE FUNCTION syntax you are already familiar with. This syntax does not change.
  4. Invoke - You then use the table function in a query. When Derby compiles the query, it sees that the return type of the table function implements org.apache.derby.vti.RestrictedVTI. Armed with this information, at runtime Derby calls the initScan() method once before calling any of the ResultSet methods.

For example, you would declare the function as follows:

public class MyVTIClass implements ResultSet, RestrictedVTI
{
    ...

    public void initScan(java.lang.String[] columnNames, 
        org.apache.derby.vti.Restriction restriction ) 
        throws SQLException {
         ... 
    }
}

Then you publish the table function method:

public static MyVTIClass foreignDatabaseEmployeeTable() 
    throws SQLException {
    ... 
}

Then you declare the table function to Derby:

create function foreignDatabaseEmployeeTable()
returns table
(
    id int,
    birthday date,
    taxPayerID varchar( 50 ),
    firstName varchar( 50 ),
    lastName varchar( 50 )
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'com.acme.portal.ForeignQueries.foreignDatabaseEmployeeTable'

Finally, you invoke the table function in a query:

select id, firstName, lastName
from table( foreignDatabaseEmployeeTable() ) s
where lastName = 'Stone'

When you invoke this query, Derby does the following:

Contract

Derby calls initScan() before calling any other method on the ResultSet. The call to initScan() merely passes hints, which the restricted table function can exploit in order to perform better. Derby enforces the restriction outside the table function. Therefore, a restricted table function can still fetch extra columns and can ignore part or all of the restriction set by the call to initScan().

Affected Operations

Compared to ordinary table functions, a restricted table function can perform better in queries involving the following comparisons of its columns to constants:

<
<=
=
>
>=
IS NULL
IS NOT NULL

In addition, performance gains can be realized for queries involving the following operators on the columns of the restricted table function:

LIKE
BETWEEN

However, this feature does not boost performance either for the IN operator, or in situations where Derby transforms OR lists into IN lists. See "Or transformations" in Tuning Derby for more information.