Chapter 8. System Management and Deployment Issues

Fred Toussi

The HSQL Development Group

$Revision: 2959 $

Copyright 2002-2009 Fred Toussi. Permission is granted to distribute this document without any alteration under the terms of the HSQLDB license. Additional permission is granted to the HSQLDB Development Group to distribute this document with or without alterations under the terms of the HSQLDB license.

$Date: 2009-04-01 13:43:36 -0400 (Wed, 01 Apr 2009) $

Table of Contents

Mode of Operation and Tables
Mode of Operation
Tables
Large Objects
Deployment context
Memory and Disk Use
Cache Memory Allocation
Managing Database Connections
Upgrading Databases
Upgrading From Older Versions
Manual Changes to the *.script File
Backing Up Database Catalogs
Making Online Backups
Making Offline Backups
Examining Backups
Restoring a Backup
Statements

Mode of Operation and Tables

HyperSQL has many modes of operation and features that allow it to be used in very different scenarios. Levels of memory usage, speed and accessibility by different applications are influenced by how HyperSQL is deployed.

Mode of Operation

The decision to run HyperSQL as a separate server process or as an in-process database should be based on the following:

  • When HyperSQL is run as a server on a separate machine, it is isolated from hardware failures and crashes on the hosts running the application.

  • When HyperSQL is run as a server on the same machine, it is isolated from application crashes and memory leaks.

  • Server connections are slower than in-process connections due to the overhead of streaming the data for each JDBC call.

Tables

TEXT tables are designed for special applications where the data has to be in an interchangeable format, such as CSV. TEXT tables should not be used for routine storage of data.

MEMORY tables and CACHED tables are generally used for data storage. The difference between the two is as follows:

  • The data for all MEMORY tables is read from the *.script file when the database is started and stored in memory. In contrast the data for cached tables is not read into memory until the table is accessed. Furthermore, only part of the data for each CACHED table is held in memory, allowing tables with more data than can be held in memory.

  • When the database is shutdown in the normal way, all the data for MEMORY tables is written out to the disk. In comparison, the data in CACHED tables that has changed is written out at shutdown, plus a compressed backup of all the data in all cached tables.

  • The size and capacity of the data cache for all the CACHED tables is configurable. This makes it possible to allow all the data in CACHED tables to be cached in memory. In this case, speed of access is good, but slightly slower than MEMORY tables.

  • For normal applications it is recommended that MEMORY tables are used for small amounts of data, leaving CACHED tables for large data sets. For special applications in which speed is paramount and a large amount of free memory is available, MEMORY tables can be used for large tables as well.

Large Objects

HyperSQL 1.9.0 supports decicated storage and access to BLOB and CLOB objects. These objects can have huge sizes. BLOB or CLOB is specified as the type of a column of the table. Afterwards, rows can be inserted into the table using a PreparedStatement for efficient transfer of large BLOB of CLOB data to the database. In mem: catalogs, CLOB and BLOB data is stored in memory. In file: catalogs, this data is stored in a single separate file which has the extension *.lobs. The size of this file can grow to huge, terabyte figures.

Deployment context

The files used for storing HSQLDB database data are all in the same directory. New files are always created and deleted by the database engine. Two simple principles must be observed:

  • The Java process running HSQLDB must have full privileges on the directory where the files are stored. This include create and delete privileges.

  • The file system must have enough spare room both for the 'permanent' and 'temporary' files. The default maximum size of the *.log file is 200MB. The *.data file can grow to up to 16GB. The .backup file can be up to the size of the *.data file. The temporary files created at the time of a SHUTDOWN can be equal in size to the *.script file and the .data file.

Memory and Disk Use

Memory used by the program can be thought of as two distinct pools: memory used for table data, and memory used for building result sets and other internal operations. In addition, when transactions are used, memory is utilised for storing the information needed for a rollback.

The memory used for a MEMORY table is the sum of memory used by each row. Each MEMORY table row is a Java object that has 2 int or reference variables. It contains an array of objects for the fields in the row. Each field is an object such as Integer, Long, String, etc. In addition each index on the table adds a node object to the row. Each node object has 6 int or reference variables. As a result, a table with just one column of type INTEGER will have four objects per row, with a total of 10 variables of 4 bytes each - currently taking up 80 bytes per row. Beyond this, each extra column in the table adds at least a few bytes to the size of each row.

All the rows in the result set are built in memory, so very large result sets may not be possible. In server mode databases, the result set memory is released from the server once the database server has returned the result set. in-process databases release the memory when the application program releases the java.sql.ResultSet object. Server modes require additional memory for returning result sets, as they convert the full result set into an array of bytes which is then transmitted to the client.

When UPDATE and DELETE queries are performed on CACHED tables, the full set of rows that are affected, including those affected due to ON UPDATE actions, is held in memory for the duration of the operation. This means it may not be possible to perform deletes or updates involving very large numbers of rows of CACHED tables. Such operations should be performed in smaller sets.

When transactions support is enabled with SET AUTOCOMMIT FALSE, lists of all insert, delete or update operations are stored in memory so that they can be undone when ROLLBACK is issued. Transactions that span hundreds of modification to data will take up a lot of memory until the next COMMIT or ROLLBACK clears the list.

Most JVM implementations allocate up to a maximum amount of memory (usually 64 MB by default). This amount is generally not adequate when large memory tables are used, or when the average size of rows in cached tables is larger than a few hundred bytes. The maximum amount of allocated memory can be set on the java command line that is used for running HyperSQL. For example, with Sun JVM version 1.3.0 the parameter -Xmx256m increases the amount to 256 MB.

By default, all the rows in the result set are built in memory, so very large result sets may not be possible. In server mode databases, the result set memory is released from the server once the database server has returned the result set. in-process databases release the memory when the application program releases the java.sql.ResultSet object. Server modes require additional memory for returning result sets, as they convert the full result set into an array of bytes which is then transmitted to the client.

**todo** updates to cover new large result support

HyperSQL uses a fast cache for immutable objects such as Integer or String that are stored in the database. In most circumstances, this reduces the memory footprint still further as fewer copies of the most frequently-used objects are kept in memory.

Cache Memory Allocation

With CACHED tables, the data is stored on disk and only up to a maximum number of rows are held in memory at any time. The default is up to 3*16384 rows. The hsqldb.cache_scale database property can be set to alter this amount. As any random subset of the rows in any of the CACHED tables can be held in the cache, the amount of memory needed by cached rows can reach the sum of the rows containing the largest field data. For example if a table with 100,000 rows contains 40,000 rows with 1,000 bytes of data in each row and 60,000 rows with 100 bytes in each, the cache can grow to contain nearly 50,000 rows, including all the 40,000 larger rows.

An additional property, hsqldb.cache_size_scale can be used in conjunction with the hsqldb.cache_scale property. This puts a limit in bytes on the total size of rows that are cached. When the default values is used for both properties, the limit on the total size of rows is approximately 50MB. (This is the size of binary images of the rows and indexes. It translates to more actual memory, typically 2-4 times, used for the cache because the data is represented by Java objects.)

If memory is limited, the hsqldb.cache_scale or hsqldb.cache_size_scale database properties can be reduced. In the example above, if the hsqldb.cache_size_scale is reduced from 10 to 8, then the total binary size limit is reduced from 50MB to 12.5 MB. This will allow the number of cached rows to reach 50,000 small rows, but only 12,500 of the larger rows.

Managing Database Connections

In all running modes (server or in-process) multiple connections to the database engine are supported. in-process (standalone) mode supports connections from the client in the same Java Virtual Machine, while server modes support connections over the network from several different clients.

Connection pooling software can be used to connect to the database but it is not generally necessary. With other database engines, connection pools are used for reasons that may not apply to HyperSQL.

  • To allow new queries to be performed while a time-consuming query is being performed in the background. In HyperSQL, if a session is in readonly mode, its queries are never blocked. If the session is in read-write mode, blocking depends on the transaction model and the current activity by other sessions.

  • To limit the maximum number of simultaneous connections to the database for performance reasons. With HSQLDB this can be useful if your application is designed in a way that opens and closes connections for each small task.

  • To control transactions in a multi-threaded application. This can be useful with HSQLDB as well. For example, in a web application, a transaction may involve some processing between the queries or user action across web pages. A separate connection should be used for each HTTP session so that the work can be committed when completed or rolled back otherwise. Although this usage cannot be applied to most other database engines, HyperSQL is perfectly capable of handling over 100 simultaneous HTTP sessions as individual JDBC connections.

An application that is not both multi-threaded and transactional, such as an application for recording user login and logout actions, does not need more than one connection. The connection can stay open indefinitely and reopened only when it is dropped due to network problems.

When using an in-process database, when the last connection to the database is closed, the database still remains open. An explicit SHUTDOWN command, with or without an argument, is required to close the database. A connection property can be used to shutdown the database when the last connection is closed.

When using a server database (and to some extent, an in-process database), care must be taken to avoid creating and dropping JDBC Connections too frequently. Failure to observe this will result in unsuccessful connection attempts when the application is under heavy load.

Upgrading Databases

Any database that is not produced with the release version of HSQLDB 2.0 must be upgraded to this version. Some catalogs created with 1.8.0 can be upgraded simply by opening with HyperSQL 2.0. When this is not possible due to errors, the rest of the procedures below should be followed.

Once a database is upgraded to 2.0, it can no longer be used with previous versions of HyperSQL.

Upgrading From Older Versions

To upgrade from the more recent versions, 1.7.2, 1.7.3 or 1.8.0 simply issue the SET SCRIPTFORMAT TEXT and SHUTDOWN SCRIPT commands with the old version, then open with the new version of the engine. The upgrade is then complete.

To upgrade from older version database files (1.7.1 and older) that do not contain CACHED tables, simply SHUTDOWN with the older version and open with the new version. If there is any error in the *.logs or *.script file, try again after editing the *.logs or *.script file.

To upgrade from older version database files (1.7.1 and older) that contain CACHED tables, use the SCRIPT procedure below. In all versions of HSQLDB, the SCRIPT 'filename' command (used as an SQL statement) allows you to save a full record of your database, including database object definitions and data, to a file of your choice. You can export a script file using the old version of the database engine and open the script as a database with 2.0.

Procedure 8.1. Upgrade Using the SCRIPT Procedure for Old Versions

  1. Open the original database in the old version of DatabaseManager

  2. Issue the SCRIPT command, for example SCRIPT 'newversion.script' to create a script file containing a copy of the database.

  3. SHUTDOWN this database.

  4. Copy the original *.properties file into newversion.properties in the same directory as newversion.script

  5. Try to open the new database newversion using DatabaseManager.

  6. If there is any inconsistency in the data, the script line number is reported on the console and the opening process is aborted. Edit and correct any problems in the newversion.script before attempting to open again. Use the guidelines in the next section (Manual Changes to the .script File). Use a programming editor that is capable of handling very large files and does not wrap long lines of text.

Manual Changes to the *.script File

In 2.0 the full range of ALTER TABLE commands is available to change the data structures and their names. However, if an old database cannot be opened due to data inconsistencies, or the use of index or column names that are not compatible with 2.0, manual editing of the *.script file can be performed.

  • Version 2.0 does not accept duplicate names for indexes that were allowed before 1.7.2.

  • Version 2.0 does not accept table or column names that are SQL reserved keywords without double quoting.

Note that the *.script file must be the result of a SHUTDOWN SCRIPT and must contain the full data for the database.The following changes can be applied so long as they do not affect the integrity of existing data.

  • Names of tables, columns and indexes can be changed.

  • CREATE UNIQUE INDEX ... to CREATE INDEX ... and vice versa

    A unique index can always be converted into a normal index. A non-unique index can only be converted into a unique index if the table data for the column(s) is unique in each row.

  • NOT NULL

    A not-null constraint can always be removed.

  • PRIMARY KEY

    A primary key constraint can be removed. It cannot be removed if there is a foreign key referencing the column(s).

  • UNIQUE

    A UNIQUE constraint can be removed if there is no foreign key referencing the column(s).

  • FOREIGN KEY

    A FOREIGN KEY constraint can always be removed.

  • COLUMN TYPES

    Some changes to column types are possible. For example an INTEGER column can be changed to BIGINT, or DATE, TIME and TIMESTAMP columns can be changed to VARCHAR.

After completing the changes and saving the modified .script file, you can open the database as normal.

Backing Up Database Catalogs

The database engine saves the files containing all the data in a file catalog when a shutdown takes place. It automatically recovers from an abnormal termination and preserves the data when the catalog is opened next time. In an ideal operating environment, where there is no OS crash, disk failure, bugs in code, etc. there would be no need regularly to backup a database. This is meant to say, the engine performs the routine shutdown procedure internally, therefore backing up catalogs is an insurance policy against all sorts of misadventure that are not under the control of the database engine.

The data for each catalog consists of up to 5 files in the same directory with the endings such as *.properties, *.script, etc., as detailed in previous chapters **todo**

HyperSQL 1.9 and later includes commands to backup the database files into a single .tar or .tar.gz file archive. The backup can be performed by a command given in a JDBC session if the target database catalog is running, or on the command-line if the target catalog has been shutdown.

Making Online Backups

To back up a running catalog, obtain a JDBC connection and issue a BACKUP DATABASE command in SQL. In its most simple form, the command format below will backup the database as a single .tar.gz file to the given directory.

BACKUP DATABASE TO <directory name> BLOCKING

See the next section under Statements for details about the command and its options. See the sections below about restoring a backup.

Making Offline Backups

To back up an offline catalog, the catalog must be in shut down state. You will run a Java command like this

Example 8.1. Offline Backup Example

    java -cp path/to/hsqldb.jar org.hsqldb.lib.tar.DbBackup --save  \
    tar/path.tar db/base/path

where tar/path.tar is a file path to the *.tar or *.tar.gz file to be created, and db/base/path is the file path to the catalog file base name (in same fashion as in server.database.* settings and JDBC URLs with catalog type file:.

Examining Backups

You can list the contents of backup tar files with DbBackup on your operating system command line, or with any Pax-compliant tar or pax client (this includes GNU tar),

Example 8.2. Listing a Backup with DbBackup

    java -cp path/to/hsqldb.jar org.hsqldb.lib.tar.DbBackup --list tar/path.tar

You can also give regular expressions at the end of the command line if you are only interested in some of the file entries in the backup. Note that these are real regular expressions, not shell globbing patterns, so you would use .+script to match entries ending in "script", not *script.

You can examine the contents of the backup in their entirety by restoring the backup, as explained in the following section, to a temporary directory.

Restoring a Backup

You use DbBackup on your operating system command line to restore a catalog from a backup.

Example 8.3. Restoring a Backup with DbBackup

    java -cp path/to/hsqldb.jar org.hsqldb.lib.tar.DbBackup --extract  \
    tar/path.tar db/dir

where tar/path.tar is a file path to the *.tar or *.tar.gz file to be read, and db/dir is the target directory to extract the catalog files into. Note that db/dir specifies a directory path, without the catalog file base name. The files will be created with the names stored in the tar file (and which you can see as described in the preceding section).

Statements

BACKUP DATABASE

backup database statement

<backup database statement> ::= BACKUP DATABASE TO <file path> {SCRIPT | [NOT] COMPRESSED} BLOCKING

Backup the database to specified <file path> for archiving purposes.

The <file path> can be in two forms. If the <file path> ends with a forward slash, it specifies a directory. In this case, an automatic name for the archive is generated that includes the date, time and the base name of the database. The database is backed up to this archive file in the specified directory. If the <file path> does not end with a forward slash, it specifies a user-defined file name for the backup archive. The archive is in tar, gzip format depending on whether it is compressed or not.

The SCRIPT option is not currently supported. If SCRIPT is specified, the backup will consist of two files, a *.properties file and a *.script file, which contain all the data and settings of the database. These files are not compressed.

If COMPRESSED or NOT COMPRESSED is specified, the backup consists of the current snapshot of database files. During backup, a CHECKPOINT command is silently executed.

The qualifier, BLOCKING, means all database operations are suspended during backup.

The HSQLDB jar also contains a program that creates an archive of an offline database. It also contains a program to expand an archive into database files. These programs are documented in ***todo Backing up Database Catalog section of Deployment Chapter.

Only a user with the DBA role can execute this statement.

CHECKPOINT

checkpoint statement

<checkpoint statement> ::= CHECKPOINT [DEFRAG]

Closes the database files, rewrites the script file, deletes the log file and opens the database. If DEFRAG is specified, also shrinks the *.data file to its minumum size. Only a user with the DBA role can execute this statement.

SCRIPT

script statement

<script statement> ::= SCRIPT [<file name>]

Returns a script containing SQL statments that define the database, its users, and its schema objects. If <file name> is not specified, the statements are returned in a Result, with each row containing an SQL statement.No data statements are included in this form. The optional file name is a single-quoted string. If <file name> is specified, then the script is written to the named file. In this case, all the data in all tables of the database is included in the script as INSERT statements.

Only a user with the DBA role can execute this statement.

SET CHECKPOINT DEFRAG

set checkpoint defrag statement

<set checkpoint defrag statement> ::= SET CHECKPOINT DEFRAG <unsigned integer literal>

Sets the threshold for perfoming a DEFRAG during a checkpoint. The <unsigned integer literal> is the megabytes of abandoned space in the *.data file. When a CHECKPOINT is performed either as a result of the .log file reaching the limit set by SET LOGSIZE m, or by the user issuing a CHECKPOINT command, the amount of space abandoned since the database was opened is checked and if it is larger than n, a CHECKPOINT DEFRAG is performed instead of a CHECKPOINT.

Only a user with the DBA role can execute this statement.

SET LOGSIZE

set log size statement

<log size statement> ::= SET LOGSIZE <unsigned integer literal>

Sets the maximum size in MB of the *.log file to the specified value. The default maximum size is 200 MB. If the value is zero, no limit is used for the size of the file. When the size of the file reaches this value, a CHECKPOINT is performed and the the *.log file is cleared to size 0. Only a user with the DBA role can execute this statement.

SET PROPERTY

set property statement

<set property statement> ::= SET PROPERTY <double quoted property name> <boolean or integer literal>;

Sets a database property. Properties that can be set using this command are either boolean or integral and are listed in the chapter.**. Only a user with the DBA role can execute this statement.

SET SCRIPTFORMAT

set scriptformat statement

<set script format statement> ::= SET SCRIPTFORMAT {TEXT | BINARY | COMPRESSED};

Changes the format of the *.script file. The default format of the *.script file is TEXT. This has the advantage that the SQL statements can be viewed, and in some cases, edited . BINARY and COMPRESSED formats are slightly faster and more compact than the default TEXT. Recommended only for very large script files. This command performs a CHECKPOINT. Only a user with the DBA role can execute this statement.

SET WRITE DELAY

set write delay statement

<set write delay statement> ::= SET WRITE_DELAY {{ TRUE | FALSE } | <seconds value> | <milliseconds value> MILLIS};

Set the WRITE_DELAY property of the database. The WRITE_DELAY controls the frequency of file sync for the log file. When WRITE_DELAY is set to FALSE or 0, the sync takes place immediately at each COMMIT. WRITE_DELAY TRUE performs the sync once every 10 seconds (which is the default). A numeric value can be specified instead.

The purpose of this command is to control the amount of data loss in case of a total system crash. A delay of 1 second means at most the data written to disk during the last second before the crash is lost. All data written prior to this has been synced and should be recoverable.

A write delay of 0 impacts performance in high load situations, as the engine has to wait for the file system to catch up.

To avoid this, you can set write delay down to 10 milliseconds.

Each time the SET WRITE_DELAY statement is executed with any value, a sync is immediately performed. Only a user with the DBA role can execute this statement.

SET DATABASE COLLATION

set database collation statement

<set database collation statement> ::= SET DATABASE COLLATION <collation name>

Each database can have its own collation. Sets the collation from the set of collations supported by HyperSQL. Once this command has been issued, the database can be opened in any JVM and will retain its collation. Only a user with the DBA role can execute this statement.

SET IGNORECASE

set ignore case statement

<set ignore case statement> ::= SET IGNORECASE {TRUE | FALSE}

Set the type used for VARCHAR table columns. By default, character columns in new databases are case sensitive. If SET IGNORECASE TRUE is specified, all VARCHAR columns in new table are set to VARCHAR_IGNORECASE. It is possible to specify the VARCHAR_IGNORECASE type for the definition of individual columns. So it is possible to have some columns case sensitive and some not, even in the same table. This statement must be switched before creating tables. Existing tables and their data are not affected.

Only a user with the DBA role can execute this statement.

SET DATABASE BACKUP INCREMENT

set database backup increment statement

<set database backup increment statement> ::= SET DATABASE BACKUP INCREMENT { TRUE | FALSE }

HyperSQL performs a backup of the .data file before its contents are modified. The setting is FALSE by default and the whole .data file is saved in a compressed form when a CHECKPOINT or SHUTDOWN is performed. This takes a long time when the size of the database exceeds 100 MB or so.

The alternative is backup in increments, just before some part of the .data file is modified. This mode can be set by specifying TRUE with this command. In this this mode, no backup is performed at CHECKPIONT or SHUTDOWN. This mode is prefered for large databases which are opened and closed frequently.

SET DATABASE REFERENTIAL INTEGRITY

set database referential integrity statement

<set database referential integrity statement> ::= SET DATABASE REFERENTIAL INTEGRITY {TRUE | FALSE}

This commands enables or disables the enforcement of referential integrity constraints (foreign key constraints). By default, referential integrity constraints are checked.

The only legitimate use of this statement is before importing large amounts of external data into tables that have existing FOREIGN KEY constraints. After importing the data, queries must be run to verify all rows conform to the FOREIGN KEY constraints and take appropriate actions for the rows that do not conform. After verification, the statement must be used again to enable constraint enforcement.

Only a user with the DBA role can execute this statement.

SET DATABASE TRANSACTION CONTROL

set database transaction control

<set database transaction control statement> ::= SET DATABASE TRANSACTION CONTROL { LOCKS | MVCC }

Set the concurrency control system for the database. It can be issued only when all sessions have been committed or rolled back.

Only a user with the DBA role can execute this statement.

SHUTDOWN

shutdown statement

<shutdown statement> ::= SHUTDOWN [IMMEDIATELY | COMPACT | SCRIPT]

Shutdown the database. If the optional qualifier is not used, a normal SHUTDOWN is performed.

SHUTDOWN IMMEDIATELY

Saves the *.log file and closes the database files. This is the quickest form of shutdown. This command should not be used as the routine method of closing the database, because when the database is accessed next time, it may take a long time to start.

SHUTDOWN COMPACT

This is similar to normal SHUTDOWN, but reduces the *.data file to its minimum size. It takes longer than normal SHUTDOWN.

SHUTDOWN SCRIPT

This is similar to SHUTDOWN COMPACT, but it does not rewrite the *.data and text table files. After SHUTDOWN SCRIPT, only the *.script and *.properties files remain. At the next startup, these files are processed and the *.data and *.backup files are created. This command in effect performs part of the job of SHUTDOWN COMPACT, leaving the other part to be performed automatically at the next startup.

This command produces a full script of the database which can be edited for special purposes prior to the next startup.

Only a user with the DBA role can execute this statement.

*

*

<* statement> ::=

**.

*

*

**.


$Revision: 2959 $