Derby Engine Architecture Overview
This paper summarizes a discussion about Derby engine architecture on the firstname.lastname@example.org list that started on August 14, 2004. Please post any corrections or additional information to the email@example.com list.
A running system is comprised of a monitor and a collection of modules.
The monitor is code that maps module requests, which are described in the next section, to implementations based upon the request and the environment. E.g. with JDK 1.3 the internal request for a JDBC driver the monitor selects Derby's JDBC 2.0 implementation, while in JDK 1.4 the driver is the JDBC 3.0 implementation. This allows Derby to present a single JDBC driver to the application regardless of JDK and internally the correct driver is loaded.
A module is a set of discrete functionality, such as a lock manager, JDBC driver, indexing method etc. A module's interface is typically defined by a set of Java interfaces, e.g. the java.sql interfaces define a interface for a JDBC driver. All callers of a module do so purely through its interface to separate api from implementation. A module's implementation is a set of classes that implement the required behavior and interfaces. Thus a module implementation can change or be replaced with a different implementation without affecting the callers' code.
Modules are either system wide (shared), e.g. error logging, or per-service with a service corresponding to a database, e.g. a lock manager would be a module in a service (database)
This architecture allows different modules to be loaded depending on the environment and in the past also supported different product configurations out of the same code base.
There are four main code areas: JDBC, SQL, Store and Services.
JDBC presents the only api to Derby to applications and consists of implementations of the java.sql and javax.sql classes for JDBC 2.0 and 3.0. Applications use Derby solely through its implementations of the top-level JDBC interfaces (Driver, DataSource, ConnectionPoolDataSource and XADataSource) and the remaining JDBC interfaces. E.g. applications can only use a Derby prepared statement through java.sql.PreparedStatement and not some class specific to Derby with additional methods.
The JDBC layer sits on top of the SQL layer.
The SQL layer is split into two main logical areas, compilation and execution.
SQL compilation is a five step process:
- parse using a parser generated by Javacc, results in a tree of query nodes
- bind to resolve all objects (e.g. table names)
- optimize to determine the best access path
- generation of a Java class (directly to byte code) to represent the statement plan
- loading of the class and creation of an instance to represent that connection's state of the query
The generated statement plan is cached and can be shared by multiple connections. DDL statements (e.g. CREATE TABLE) use a common statement plan to avoid generation of a Java class file.
This implementation was driven by the original goal to have a small footprint. Using the JVM's interepter was thought to be less code than having an internal one. It does mean that the first couple of times the statement plan is executed, it would be interpreted. After a number of executions, the Just-In-Time (JIT) compiler will decide to compile it into native code. Thus, running performance tests will see a boost after a number of iterations. In addition, calling into Java user-supplied methods (functions and procedures) is direct, rather than through reflection.
SQL Execution is calling execute methods on the instance of the generated class that return a result set object. This result set is a Derby ResultSet class, not a JDBC one. The JDBC layer presents the Derby ResultSet as a JDBC one to the application. For a simple table scan the query would consist of a single result set object representing the table scan. For a more complex query the top-level result set "hides" a tree of result sets that correspond to the correct query. E.g. a project-restrict result set on top of a join result set that is joining a table scan result set on T1 with a index scan on table T2. DML (INSERT/UPDATE/DELETE) are handled the same way, with a ResultSet that performs all of its work in its open method and returns an update count.
These result set objects interface with the Store layer to fetch rows from tables, indexes or perform sorts.
The Store layer is split into two main areas, access and raw. The access layer presents a conglomerate (table or index)/row based interface to the SQL layer. It handles table scans, index scans, index lookups, indexing, sorting, locking policies, transactions, isolation levels.
The access layer sits on top of the raw store which provides the raw storage of rows in pages in files, transaction logging, transaction management. JCE encryption is plugged in here at the page level. The raw store works with a pluggable file system api that allows the data files to be stored in the Java filesystem, jar files, jar files in the classpath, or any other mechanism.
Services are utility modules such as lock management, cache management (single cache module used to cache many different types from pages to string translations), error logging etc.
The services/cache component is a general purpose caching mechanism that simply caches objects that implement the Cacheable interface. It is used by the store for buffer caching, but it is also used to cache compiled plans for SQL statements, open containers and their file descriptors, string conversions, table descriptors and maybe more.
The interface/implementation split ensures that the cache algorithm is not known by the users of the cache and can be changed with affecting them. The currrent algorithm is a "clock or ring" based algorithm.
Last Updated: September 10, 2004