Programming user-defined aggregates

Derby allows you to create custom aggregate operators, called user-defined aggregates (UDAs).

A UDA is a Java class that implements the org.apache.derby.agg.Aggregator interface.

The org.apache.derby.agg.Aggregator interface extends java.io.Serializable, so you must make sure that all of the state of your UDA is serializable. A UDA may be serialized to disk when it performs grouped aggregation over a large number of groups. That is, intermediate results may be serialized to disk for a query like the following:

SELECT a, myAggregate( b ) FROM myTable GROUP BY a

The serialization will fail if the UDA contains non-serializable fields.

The following class provides an aggregate that computes the median value from a list of objects. This is a generic class. Its parameter must be a linear (Comparable) type.

import java.util.ArrayList;
import java.util.Collections;
import org.apache.derby.agg.Aggregator;

public class Median<V extends Comparable<V>> 
        implements Aggregator<V,V,Median<V>>
{
    private ArrayList<V> _values;

    public Median() {}

    public void init() { _values = new ArrayList<V>(); }

    public void accumulate( V value ) { _values.add( value ); }

    public void merge( Median<V> other )
    { 
        _values.addAll( other._values ); 
    }

    public V terminate()
    {
        Collections.sort( _values );

        int count = _values.size();
        
        if ( count == 0 ) { return null; }
        else { return _values.get( count/2 ); }
    }
}

Using this generic class, we can declare UDAs for all of the sortable Derby data types. For example:

create derby aggregate intMedian for int external name 'Median';
create derby aggregate varcharMedian for varchar( 32672 ) external name
  'Median';

We can then use these UDAs just like built-in Derby aggregates:

create table intValues( a int, b int );
create table varcharValues( a int, b varchar( 32672 ) );
insert into intValues values ( 1, 1 ), ( 1, 10 ), ( 1, 100 ), 
  ( 1, 1000 ), ( 2, 5 ), ( 2, 50 ), ( 2, 500 ), ( 2, 5000 );
insert into varcharValues values ( 1, 'a' ), ( 1, 'ab' ), ( 1, 'abc' ), 
  ( 2, 'a' ), ( 2, 'aa' ), ( 2, 'aaa' );

select a, intMedian( b ) from intValues group by a;
A          |2
-----------------------
1          |100
2          |500

select varcharMedian( b ) from varcharValues;
1
---
aaa

See "CREATE DERBY AGGREGATE statement" in the Derby Reference Manual for more information.