Using the luceneSupport optional tool

The luceneSupport plugin is an optional tool that lets you use Apache Lucene to perform full-text indexing and searching of the contents of Derby text columns.

The mainline API documentation for Apache Lucene at https://builds.apache.org/job/Lucene-Artifacts-trunk/javadoc/ is a useful starting point for understanding Lucene's capabilities.

Note: The luceneSupport plugin can be used only after a database has been fully upgraded to Derby Release 10.11 or higher. (See "Upgrading a database" in the Derby Developer's Guide for more information.) The plugin cannot be used on a database that is at Release 10.10 or lower.

Terminology

The following concepts are important to an understanding of the luceneSupport plugin.

Analyzer: An analyzer is an implementation of org.apache.lucene.analysis.Analyzer. It extracts indexable terms from a block of text. The same analyzer should be used to index the text and to query it. An analyzer may perform language-specific tasks such as stemming and filtering. More information on analyzers can be found in the Lucene API documentation. Users can extend the existing Lucene analyzers or write their own custom analyzers.
Filtering: Filtering is the language-specific task of throwing away insignificant words such as articles and conjunctions.
Query-parsing: Query-parsing is the process of interpreting a Lucene query string. Lucene has its own query language. By extending the default Lucene QueryParser class, users can enhance the Lucene query language or replace it with some other query language.
Score: The score measures how well a query matches a block of text (a text column value). The higher the score, the better the match. The score is a float value. There is no minimum or maximum value.
Stemming: Stemming is the language-specific task of reducing related words to their common root. For instance, an English stemmer might map all of the following words onto the common root "house": "house", "houses", "housed", and "housing".

Classpath for running the luceneSupport optional tool

Before you run the luceneSupport optional tool, make sure that your classpath contains the following jar files:

derby.jar
derbyoptionaltools.jar
core: The core Lucene machinery. For Lucene 4.5.0, this is lucene-core-4.5.0.jar.
analyzers-common: The common Lucene analyzers. For Lucene 4.5.0, this is lucene-analyzers-common-4.5.0.jar.
queryparser: The basic Lucene logic for query-parsing. For Lucene 4.5.0, this is lucene-queryparser-4.5.0.jar.

The Lucene jar files are included in the Derby source tree; alternatively, you can download them from http://lucene.apache.org/.

Loading and unloading the luceneSupport optional tool

In a database protected by SQL authorization, only the database owner can issue the commands which load and unload the Lucene plugin. (See "Database Owner" in the Derby Security Guide for more information.)

Loading the plugin looks very much like loading any other optional tool. You call the SYSCS_UTIL.SYSCS_REGISTER_TOOL system procedure in a statement like the following:

call syscs_util.syscs_register_tool( 'luceneSupport', true );

This command creates the LUCENESUPPORT schema, which contains the following objects:

CREATEINDEX: A database procedure for indexing Derby text columns. See Creating an index for details.
UPDATEINDEX: A database procedure for refreshing an index built by CREATEINDEX. See Updating an index for details.
DROPINDEX: A database procedure for dropping an index built by CREATEINDEX. See Dropping an index for details.
LISTINDEXES: A table function for listing the indexes created by CREATEINDEX. See Listing indexes for details.

Removing the plugin also looks much like unloading other optional tools. Call the SYSCS_UTIL.SYSCS_REGISTER_TOOL system procedure in a statement like the following:

call syscs_util.syscs_register_tool( 'luceneSupport', false );

This command does the following:

Drops Lucene directories: Deletes the directories which were created to hold the Lucene indexes
Drops schema objects: Drops all schema objects created by CREATEINDEX commands
Drops LUCENESUPPORT: Drops the LUCENESUPPORT schema and all schema objects which it contains

See the Derby Reference Manual for information about the SYSCS_UTIL.SYSCS_REGISTER_TOOL system procedure.

Encryption and the luceneSupport tool

The luceneSupport tool may not be used on an encrypted database. Users who need full-text indexing of encrypted data should store the database in an encrypted directory or on an encrypted device.

Lucene versions

The Derby community has tested the luceneSupport tool against the following versions of Lucene. Other versions of Lucene may or may not work.

4.5.0
4.7.1
4.8.1
4.9.0

Derby cannot make any guarantees about the compatibility of two different versions of Lucene. Users should bear the following in mind:

No time travel: Derby will raise an error if you try to use an earlier version of Lucene to read an index created by a later version of Lucene.
Bounce your indexes: When you change versions of Lucene, it is always safest to call LUCENESUPPORT.UPDATEINDEX on all of your existing Lucene indexes (see Updating an index).