Measuring the cost of Derby-style table functions

This topic shows how to measure the cost of a Derby-style table function.

The following formula describes how to estimate the value returned by VTICosting.getEstimatedCostPerInstantiation():

C = I * A

where

C = The estimated Cost for creating and running the table function. That is, the value returned by VTICosting.getEstimatedCostPerInstantiation(). In general, Cost is a measure of time in milliseconds.
I = The optimizer's Imprecision. A measure of how skewed the optimizer's estimates tend to be in your particular environment. See below for instructions on how to estimate this Imprecision.
A = The Actual time in milliseconds which it takes to create and run this table function.

Calculating the optimizer's imprecision

We treat optimizer Imprecision as a constant across the runtime environment. The following formula describes it:

I = O / T

where

O = The Optimizer's estimated cost for a plan.
T = The Total runtime in milliseconds for the plan.

To estimate these values, turn on Derby statistics collection and run the following experiment several times, averaging the results:

Select = Select all of the rows from a big table.
Record = In the statistics output, look for the ResultSet which represents the table scan. That scan has a field labelled "optimizer estimated cost". That's O. Now look for the fields in that ResultSet's statistics labelled "constructor time", "open time", "next time", and "close time". Add up all of those fields. That total is T.

For example:

MAXIMUMDISPLAYWIDTH 7000;

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1);

select * from T;

values SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();

Calculating the actual runtime cost of a table function

The following formula explains how to compute the Actual runtime cost for the table function:

A = ( P * N ) + E

where

P = The runtime spent Per row (in milliseconds).
N = The Number of rows in the table function.
E = The time spent creating an Empty instance of the table function which has no rows in it. Usually, P * N dwarfs E. That is, the table function instantiation cost is very small compared to the actual cost of looping through the rows. However, for some table functions, E may be significant and may dominate the table function's cost when N is small.

You may know that E is basically 0. If so, you can skip this step. Otherwise, to estimate E, turn on Derby statistics collection and run the following experiment several times, averaging the results:

Short-circuit = Short-circuit the next() method of the ResultSet returned by your Derby-style Table Function so that it returns false the first time it is called. This makes it appear that the ResultSet has no rows.
Select = Select all of the rows from the table function.
Record = In the statistics output, look for the VTIResultSet which represents the table function scan. Add up the values of the fields in that VTIResultSet's statistics labelled "constructor time", "open time", "next time", and "close time". That total is E.

To estimate P, turn on Derby statistics collection and run the following experiment several times, averaging the results:

Select = Select all of the rows from the table function.
Record = In the statistics output, look for the VTIResultSet which represents the table function scan. Add up the values of the fields in that VTIResultSet's statistics labelled "constructor time", "open time", "next time", and "close time". Subtract E from the result. Now divide by the value of the field "Rows seen". The result is P.

Computing the value returned by getEstimatedCostPerInstantiation()

Putting all of this together, the following formula describes the value returned by your table function's VTICosting.getEstimatedCostPerInstantiation() method.

C = O/T * [ ( P * N ) + E ]