The luceneSupport plugin is an optional tool that
lets you use Apache Lucene to perform full-text indexing and searching of the
contents of Derby text
columns.
The mainline API documentation for Apache Lucene at https://builds.apache.org/job/Lucene-Artifacts-trunk/javadoc/ is a useful starting point for understanding Lucene's
capabilities.
Note: The luceneSupport plugin can be used only after a
database has been fully upgraded to
Derby Release 10.11 or
higher. (See "Upgrading a database" in the
Derby Developer's Guide for more information.) The
plugin cannot be used on a database that is at Release 10.10 or lower.
Terminology
The following concepts are important to an understanding of the
luceneSupport plugin.
- Analyzer: An analyzer is an implementation of
org.apache.lucene.analysis.Analyzer. It extracts indexable
terms from a block of text. The same analyzer should be used to index the text
and to query it. An analyzer may perform language-specific tasks such as
stemming and filtering. More information on analyzers can be found
in the Lucene API documentation. Users can extend the existing Lucene analyzers
or write their own custom analyzers.
- Filtering: Filtering is the language-specific task of throwing away
insignificant words such as articles and conjunctions.
- Query-parsing: Query-parsing is the process of interpreting a Lucene
query string. Lucene has its own query language. By extending the default Lucene
QueryParser class, users can enhance the Lucene query language
or replace it with some other query language.
- Score: The score measures how well a query matches a block of text
(a text column value). The higher the score, the better the match. The score is
a float value. There is no minimum or maximum value.
- Stemming: Stemming is the language-specific task of reducing related
words to their common root. For instance, an English stemmer might map all of
the following words onto the common root "house": "house", "houses", "housed",
and "housing".
Classpath for running the luceneSupport optional tool
Before you run the luceneSupport optional tool, make sure
that your classpath contains the following jar files:
- derby.jar
- derbyoptionaltools.jar
- core: The core Lucene machinery. For Lucene 4.5.0, this is
lucene-core-4.5.0.jar.
- analyzers-common: The common Lucene analyzers. For
Lucene 4.5.0, this is lucene-analyzers-common-4.5.0.jar.
- queryparser: The basic Lucene logic for query-parsing. For
Lucene 4.5.0, this is lucene-queryparser-4.5.0.jar.
The Lucene jar files are included in the
Derby source tree;
alternatively, you can download them from
http://lucene.apache.org/.
Loading and unloading the luceneSupport optional tool
In a database protected by SQL authorization, only the database owner can
issue the commands which load and unload the Lucene plugin. (See
"Database Owner" in the Derby Security Guide for more
information.)
Loading the plugin looks very much like loading any other optional tool. You
call the SYSCS_UTIL.SYSCS_REGISTER_TOOL system procedure in
a statement like the following:
call syscs_util.syscs_register_tool( 'luceneSupport', true );
This command creates the LUCENESUPPORT schema, which
contains the following objects:
- CREATEINDEX: A database procedure for indexing
Derby text columns. See
Creating an index for details.
- UPDATEINDEX: A database procedure for refreshing an index
built by CREATEINDEX. See
Updating an index for details.
- DROPINDEX: A database procedure for dropping an index built
by CREATEINDEX. See Dropping an index for
details.
- LISTINDEXES: A table function for listing the indexes
created by CREATEINDEX. See
Listing indexes for details.
Removing the plugin also looks much like unloading other optional tools. Call
the SYSCS_UTIL.SYSCS_REGISTER_TOOL system procedure in a
statement like the following:
call syscs_util.syscs_register_tool( 'luceneSupport', false );
This command does the following:
- Drops Lucene directories: Deletes the directories which were created
to hold the Lucene indexes
- Drops schema objects: Drops all schema objects created by
CREATEINDEX commands
- Drops LUCENESUPPORT: Drops the
LUCENESUPPORT schema and all schema objects which it
contains
See the Derby Reference Manual for information about
the SYSCS_UTIL.SYSCS_REGISTER_TOOL system procedure.
Encryption and the luceneSupport tool
The luceneSupport tool may not be used on an encrypted
database. Users who need full-text indexing of encrypted data should store the
database in an encrypted directory or on an encrypted device.
Lucene versions
The Derby community has
tested the luceneSupport tool against the following versions of
Lucene. Other versions of Lucene may or may not work.
Derby cannot make any
guarantees about the compatibility of two different versions of Lucene. Users
should bear the following in mind:
- No time travel: Derby
will raise an error if you try to use an earlier version of Lucene to read an
index created by a later version of Lucene.
- Bounce your indexes: When you change versions of Lucene, it is always
safest to call LUCENESUPPORT.UPDATEINDEX on all of your
existing Lucene indexes (see Updating an index).