$Revision: 2959 $
Copyright 2002-2009 Fred Toussi. Permission is granted to distribute this document without any alteration under the terms of the HSQLDB license. Additional permission is granted to the HSQLDB Development Group to distribute this document with or without alterations under the terms of the HSQLDB license.
$Date: 2009-04-01 13:43:36 -0400 (Wed, 01 Apr 2009) $
Table of Contents
HyperSQL has many modes of operation and features that allow it to be used in very different scenarios. Levels of memory usage, speed and accessibility by different applications are influenced by how HyperSQL is deployed.
The decision to run HyperSQL as a separate server process or as an in-process database should be based on the following:
When HyperSQL is run as a server on a separate machine, it is isolated from hardware failures and crashes on the hosts running the application.
When HyperSQL is run as a server on the same machine, it is isolated from application crashes and memory leaks.
Server connections are slower than in-process connections due to the overhead of streaming the data for each JDBC call.
TEXT tables are designed for special applications where the data has to be in an interchangeable format, such as CSV. TEXT tables should not be used for routine storage of data.
MEMORY tables and CACHED tables are generally used for data storage. The difference between the two is as follows:
The data for all MEMORY tables is read from the *.script file when the database is started and stored in memory. In contrast the data for cached tables is not read into memory until the table is accessed. Furthermore, only part of the data for each CACHED table is held in memory, allowing tables with more data than can be held in memory.
When the database is shutdown in the normal way, all the data for MEMORY tables is written out to the disk. In comparison, the data in CACHED tables that has changed is written out at shutdown, plus a compressed backup of all the data in all cached tables.
The size and capacity of the data cache for all the CACHED tables is configurable. This makes it possible to allow all the data in CACHED tables to be cached in memory. In this case, speed of access is good, but slightly slower than MEMORY tables.
For normal applications it is recommended that MEMORY tables are used for small amounts of data, leaving CACHED tables for large data sets. For special applications in which speed is paramount and a large amount of free memory is available, MEMORY tables can be used for large tables as well.
HyperSQL 1.9.0 supports decicated storage and access to BLOB and CLOB objects. These objects can have huge sizes. BLOB or CLOB is specified as the type of a column of the table. Afterwards, rows can be inserted into the table using a PreparedStatement for efficient transfer of large BLOB of CLOB data to the database. In mem: catalogs, CLOB and BLOB data is stored in memory. In file: catalogs, this data is stored in a single separate file which has the extension *.lobs. The size of this file can grow to huge, terabyte figures.
The files used for storing HSQLDB database data are all in the same directory. New files are always created and deleted by the database engine. Two simple principles must be observed:
The Java process running HSQLDB must have full privileges on the directory where the files are stored. This include create and delete privileges.
The file system must have enough spare room both for the 'permanent' and 'temporary' files. The default maximum size of the *.log file is 200MB. The *.data file can grow to up to 16GB. The .backup file can be up to the size of the *.data file. The temporary files created at the time of a SHUTDOWN can be equal in size to the *.script file and the .data file.
Memory used by the program can be thought of as two distinct pools: memory used for table data, and memory used for building result sets and other internal operations. In addition, when transactions are used, memory is utilised for storing the information needed for a rollback.
The memory used for a MEMORY table is the sum of memory used by each
row. Each MEMORY table row is a Java object that has 2 int or reference
variables. It contains an array of objects for the fields in the row. Each
field is an object such as Integer,
Long, String, etc. In
addition each index on the table adds a node object to the row. Each node
object has 6 int or reference variables. As a result, a table with just
one column of type INTEGER will have four objects per row, with a total of
10 variables of 4 bytes each - currently taking up 80 bytes per row.
Beyond this, each extra column in the table adds at least a few bytes to
the size of each row.
All the rows in the result set are built in memory, so very large
result sets may not be possible. In server mode databases, the result set
memory is released from the server once the database server has returned
the result set. in-process databases release the
memory when the application program releases the
java.sql.ResultSet object. Server modes require
additional memory for returning result sets, as they convert the full
result set into an array of bytes which is then transmitted to the
client.
When UPDATE and DELETE queries are performed on CACHED tables, the full set of rows that are affected, including those affected due to ON UPDATE actions, is held in memory for the duration of the operation. This means it may not be possible to perform deletes or updates involving very large numbers of rows of CACHED tables. Such operations should be performed in smaller sets.
When transactions support is enabled with SET AUTOCOMMIT FALSE, lists of all insert, delete or update operations are stored in memory so that they can be undone when ROLLBACK is issued. Transactions that span hundreds of modification to data will take up a lot of memory until the next COMMIT or ROLLBACK clears the list.
Most JVM implementations allocate up to a maximum amount of memory (usually 64 MB by default). This amount is generally not adequate when large memory tables are used, or when the average size of rows in cached tables is larger than a few hundred bytes. The maximum amount of allocated memory can be set on the java command line that is used for running HyperSQL. For example, with Sun JVM version 1.3.0 the parameter -Xmx256m increases the amount to 256 MB.
By default, all the rows in the result set are built in memory, so
very large result sets may not be possible. In server mode databases, the
result set memory is released from the server once the database server has
returned the result set. in-process databases
release the memory when the application program releases the
java.sql.ResultSet object. Server modes require
additional memory for returning result sets, as they convert the full
result set into an array of bytes which is then transmitted to the
client.
**todo** updates to cover new large result support
HyperSQL uses a fast cache for immutable objects such as Integer or String that are stored in the database. In most circumstances, this reduces the memory footprint still further as fewer copies of the most frequently-used objects are kept in memory.
With CACHED tables, the data is stored on disk and only up to a maximum number of rows are held in memory at any time. The default is up to 3*16384 rows. The hsqldb.cache_scale database property can be set to alter this amount. As any random subset of the rows in any of the CACHED tables can be held in the cache, the amount of memory needed by cached rows can reach the sum of the rows containing the largest field data. For example if a table with 100,000 rows contains 40,000 rows with 1,000 bytes of data in each row and 60,000 rows with 100 bytes in each, the cache can grow to contain nearly 50,000 rows, including all the 40,000 larger rows.
An additional property, hsqldb.cache_size_scale can be used in conjunction with the hsqldb.cache_scale property. This puts a limit in bytes on the total size of rows that are cached. When the default values is used for both properties, the limit on the total size of rows is approximately 50MB. (This is the size of binary images of the rows and indexes. It translates to more actual memory, typically 2-4 times, used for the cache because the data is represented by Java objects.)
If memory is limited, the hsqldb.cache_scale or hsqldb.cache_size_scale database properties can be reduced. In the example above, if the hsqldb.cache_size_scale is reduced from 10 to 8, then the total binary size limit is reduced from 50MB to 12.5 MB. This will allow the number of cached rows to reach 50,000 small rows, but only 12,500 of the larger rows.
In all running modes (server or in-process) multiple connections to the database engine are supported. in-process (standalone) mode supports connections from the client in the same Java Virtual Machine, while server modes support connections over the network from several different clients.
Connection pooling software can be used to connect to the database but it is not generally necessary. With other database engines, connection pools are used for reasons that may not apply to HyperSQL.
To allow new queries to be performed while a time-consuming query is being performed in the background. In HyperSQL, if a session is in readonly mode, its queries are never blocked. If the session is in read-write mode, blocking depends on the transaction model and the current activity by other sessions.
To limit the maximum number of simultaneous connections to the database for performance reasons. With HSQLDB this can be useful if your application is designed in a way that opens and closes connections for each small task.
To control transactions in a multi-threaded application. This can be useful with HSQLDB as well. For example, in a web application, a transaction may involve some processing between the queries or user action across web pages. A separate connection should be used for each HTTP session so that the work can be committed when completed or rolled back otherwise. Although this usage cannot be applied to most other database engines, HyperSQL is perfectly capable of handling over 100 simultaneous HTTP sessions as individual JDBC connections.
An application that is not both multi-threaded and transactional, such as an application for recording user login and logout actions, does not need more than one connection. The connection can stay open indefinitely and reopened only when it is dropped due to network problems.
When using an in-process database, when the last connection to the database is closed, the database still remains open. An explicit SHUTDOWN command, with or without an argument, is required to close the database. A connection property can be used to shutdown the database when the last connection is closed.
When using a server database (and to some extent, an in-process database), care must be taken to avoid creating and dropping JDBC Connections too frequently. Failure to observe this will result in unsuccessful connection attempts when the application is under heavy load.
Any database that is not produced with the release version of HSQLDB 2.0 must be upgraded to this version. Some catalogs created with 1.8.0 can be upgraded simply by opening with HyperSQL 2.0. When this is not possible due to errors, the rest of the procedures below should be followed.
Once a database is upgraded to 2.0, it can no longer be used with previous versions of HyperSQL.
To upgrade from the more recent versions, 1.7.2, 1.7.3 or 1.8.0 simply issue the SET SCRIPTFORMAT TEXT and SHUTDOWN SCRIPT commands with the old version, then open with the new version of the engine. The upgrade is then complete.
To upgrade from older version database files (1.7.1 and older)
that do not contain CACHED tables, simply SHUTDOWN with the older
version and open with the new version. If there is any error in the
*.logs or *.script file, try again
after editing the *.logs or
*.script file.
To upgrade from older version database files (1.7.1 and older)
that contain CACHED tables, use the SCRIPT procedure below. In all
versions of HSQLDB, the SCRIPT 'filename' command
(used as an SQL statement) allows you to save a full record of your
database, including database object definitions and data, to a file of
your choice. You can export a script file using the old version of the
database engine and open the script as a database with 2.0.
Procedure 8.1. Upgrade Using the SCRIPT Procedure for Old Versions
Open the original database in the old version of DatabaseManager
Issue the SCRIPT command, for example SCRIPT
'newversion.script' to create a script file containing a
copy of the database.
SHUTDOWN this database.
Copy the original *.properties file into
newversion.properties in the same directory as
newversion.script
Try to open the new database newversion
using DatabaseManager.
If there is any inconsistency in the data, the script line
number is reported on the console and the opening process is
aborted. Edit and correct any problems in the
newversion.script before attempting to open
again. Use the guidelines in the next section (Manual Changes to the
.script File). Use a programming editor that is
capable of handling very large files and does not wrap long lines of
text.
In 2.0 the full range of ALTER TABLE commands is available to
change the data structures and their names. However, if an old database
cannot be opened due to data inconsistencies, or the use of index or
column names that are not compatible with 2.0, manual editing of the
*.script file can be performed.
Version 2.0 does not accept duplicate names for indexes that were allowed before 1.7.2.
Version 2.0 does not accept table or column names that are SQL reserved keywords without double quoting.
Note that the *.script file must be the result
of a SHUTDOWN SCRIPT and must contain the full data for the database.The
following changes can be applied so long as they do not affect the
integrity of existing data.
Names of tables, columns and indexes can be changed.
CREATE UNIQUE INDEX ... to CREATE
INDEX ... and vice versa
A unique index can always be converted into a normal index. A non-unique index can only be converted into a unique index if the table data for the column(s) is unique in each row.
NOT NULL
A not-null constraint can always be removed.
PRIMARY KEY
A primary key constraint can be removed. It cannot be removed if there is a foreign key referencing the column(s).
UNIQUE
A UNIQUE constraint can be removed if there is no foreign key referencing the column(s).
FOREIGN KEY
A FOREIGN KEY constraint can always be removed.
COLUMN TYPES
Some changes to column types are possible. For example an INTEGER column can be changed to BIGINT, or DATE, TIME and TIMESTAMP columns can be changed to VARCHAR.
After completing the changes and saving the modified
.script file, you can open the database as
normal.
The database engine saves the files containing all the data in a file catalog when a shutdown takes place. It automatically recovers from an abnormal termination and preserves the data when the catalog is opened next time. In an ideal operating environment, where there is no OS crash, disk failure, bugs in code, etc. there would be no need regularly to backup a database. This is meant to say, the engine performs the routine shutdown procedure internally, therefore backing up catalogs is an insurance policy against all sorts of misadventure that are not under the control of the database engine.
The data for each catalog consists of up to 5 files in the same
directory with the endings such as *.properties,
*.script, etc., as detailed in previous chapters
**todo**
HyperSQL 1.9 and later includes commands to backup the database
files into a single .tar or .tar.gz
file archive. The backup can be performed by a command given in a JDBC
session if the target database catalog is running, or on the command-line
if the target catalog has been shutdown.
To back up a running catalog, obtain a JDBC connection and
issue a BACKUP DATABASE command in SQL. In its most
simple form, the command format below will backup the database as a
single .tar.gz file to the given directory.
BACKUP DATABASE TO <directory name> BLOCKING
See the next section under Statements for details about the command and its options. See the sections below about restoring a backup.
To back up an offline catalog, the catalog must be in shut down state. You will run a Java command like this
Example 8.1. Offline Backup Example
java -cp path/to/hsqldb.jar org.hsqldb.lib.tar.DbBackup --save \
tar/path.tar db/base/path
tar/path.tar is a file path to
the *.tar or *.tar.gz file to be
created, and db/base/path is the file path to the
catalog file base name (in same fashion as in
server.database.* settings and JDBC URLs with catalog
type file:.
You can list the contents of backup tar files with
DbBackup on your operating system command line,
or with any Pax-compliant tar or pax client (this includes GNU tar),
Example 8.2. Listing a Backup with DbBackup
java -cp path/to/hsqldb.jar org.hsqldb.lib.tar.DbBackup --list tar/path.tar
.+script to match
entries ending in "script", not *script.
You can examine the contents of the backup in their entirety by restoring the backup, as explained in the following section, to a temporary directory.
You use DbBackup on your operating system
command line to restore a catalog from a backup.
Example 8.3. Restoring a Backup with DbBackup
java -cp path/to/hsqldb.jar org.hsqldb.lib.tar.DbBackup --extract \
tar/path.tar db/dir
tar/path.tar is a file path to
the *.tar or *.tar.gz file to be read, and db/dir
is the target directory to extract the catalog files into. Note that
db/dir specifies a directory path, without the
catalog file base name. The files will be created with the names stored
in the tar file (and which you can see as described in the preceding
section).
BACKUP DATABASE
backup database statement
<backup database statement> ::= BACKUP DATABASE TO
<file path> {SCRIPT | [NOT] COMPRESSED} BLOCKING
Backup the database to specified <file
path> for archiving purposes.
The <file path> can be in two forms. If
the <file path> ends with a forward slash, it
specifies a directory. In this case, an automatic name for the archive is
generated that includes the date, time and the base name of the database.
The database is backed up to this archive file in the specified directory.
If the <file path> does not end with a forward
slash, it specifies a user-defined file name for the backup archive. The
archive is in tar, gzip format depending on whether it is compressed or
not.
The SCRIPT option is not currently supported. If SCRIPT is
specified, the backup will consist of two files, a
*.properties file and a *.script
file, which contain all the data and settings of the database. These files
are not compressed.
If COMPRESSED or NOT COMPRESSED is specified, the backup consists of the current snapshot of database files. During backup, a CHECKPOINT command is silently executed.
The qualifier, BLOCKING, means all database operations are suspended during backup.
The HSQLDB jar also contains a program that creates an archive of an offline database. It also contains a program to expand an archive into database files. These programs are documented in ***todo Backing up Database Catalog section of Deployment Chapter.
Only a user with the DBA role can execute this statement.
CHECKPOINT
checkpoint statement
<checkpoint statement> ::= CHECKPOINT
[DEFRAG]
Closes the database files, rewrites the script file, deletes the
log file and opens the database. If DEFRAG is
specified, also shrinks the *.data file to its minumum
size. Only a user with the DBA role can execute this statement.
SCRIPT
script statement
<script statement> ::= SCRIPT [<file
name>]
Returns a script containing SQL statments that define the
database, its users, and its schema objects. If <file
name> is not specified, the statements are returned in a
Result, with each row containing an SQL statement.No data statements are
included in this form. The optional file name is a single-quoted string.
If <file name> is specified, then the script is
written to the named file. In this case, all the data in all tables of the
database is included in the script as INSERT statements.
Only a user with the DBA role can execute this statement.
SET CHECKPOINT DEFRAG
set checkpoint defrag statement
<set checkpoint defrag statement> ::= SET
CHECKPOINT DEFRAG <unsigned integer literal>
Sets the threshold for perfoming a DEFRAG during a checkpoint.
The <unsigned integer literal> is the megabytes
of abandoned space in the *.data file. When a
CHECKPOINT is performed either as a result of the .log
file reaching the limit set by SET LOGSIZE m, or by the
user issuing a CHECKPOINT command, the amount of space abandoned since the
database was opened is checked and if it is larger than n, a CHECKPOINT
DEFRAG is performed instead of a CHECKPOINT.
Only a user with the DBA role can execute this statement.
SET LOGSIZE
set log size statement
<log size statement> ::= SET LOGSIZE <unsigned
integer literal>
Sets the maximum size in MB of the *.log file
to the specified value. The default maximum size is 200 MB. If the value
is zero, no limit is used for the size of the file. When the size of the
file reaches this value, a CHECKPOINT is performed and the the
*.log file is cleared to size 0. Only a user with the
DBA role can execute this statement.
SET PROPERTY
set property statement
<set property statement> ::= SET PROPERTY
<double quoted property name> <boolean or integer
literal>;
Sets a database property. Properties that can be set using this command are either boolean or integral and are listed in the chapter.**. Only a user with the DBA role can execute this statement.
SET SCRIPTFORMAT
set scriptformat statement
<set script format statement> ::= SET SCRIPTFORMAT
{TEXT | BINARY | COMPRESSED};
Changes the format of the *.script file. The
default format of the *.script file is TEXT. This has
the advantage that the SQL statements can be viewed, and in some cases,
edited . BINARY and COMPRESSED formats are slightly faster and more
compact than the default TEXT. Recommended only for very large script
files. This command performs a CHECKPOINT. Only a user with the DBA role
can execute this statement.
SET WRITE DELAY
set write delay statement
<set write delay statement> ::= SET WRITE_DELAY {{
TRUE | FALSE } | <seconds value> | <milliseconds value>
MILLIS};
Set the WRITE_DELAY property of the database. The WRITE_DELAY controls the frequency of file sync for the log file. When WRITE_DELAY is set to FALSE or 0, the sync takes place immediately at each COMMIT. WRITE_DELAY TRUE performs the sync once every 10 seconds (which is the default). A numeric value can be specified instead.
The purpose of this command is to control the amount of data loss in case of a total system crash. A delay of 1 second means at most the data written to disk during the last second before the crash is lost. All data written prior to this has been synced and should be recoverable.
A write delay of 0 impacts performance in high load situations, as the engine has to wait for the file system to catch up.
To avoid this, you can set write delay down to 10 milliseconds.
Each time the SET WRITE_DELAY statement is executed with any value, a sync is immediately performed. Only a user with the DBA role can execute this statement.
SET DATABASE COLLATION
set database collation statement
<set database collation statement> ::= SET
DATABASE COLLATION <collation name>
Each database can have its own collation. Sets the collation from the set of collations supported by HyperSQL. Once this command has been issued, the database can be opened in any JVM and will retain its collation. Only a user with the DBA role can execute this statement.
SET IGNORECASE
set ignore case statement
<set ignore case statement> ::= SET IGNORECASE
{TRUE | FALSE}
Set the type used for VARCHAR table columns. By default,
character columns in new databases are case sensitive. If SET
IGNORECASE TRUE is specified, all VARCHAR columns in new table
are set to VARCHAR_IGNORECASE. It is possible to
specify the VARCHAR_IGNORECASE type for the definition
of individual columns. So it is possible to have some columns case
sensitive and some not, even in the same table. This statement must be
switched before creating tables. Existing tables and their data are not
affected.
Only a user with the DBA role can execute this statement.
SET DATABASE BACKUP INCREMENT
set database backup increment statement
<set database backup increment statement> ::= SET
DATABASE BACKUP INCREMENT { TRUE | FALSE }
HyperSQL performs a backup of the .data file before its contents are modified. The setting is FALSE by default and the whole .data file is saved in a compressed form when a CHECKPOINT or SHUTDOWN is performed. This takes a long time when the size of the database exceeds 100 MB or so.
The alternative is backup in increments, just before some part of the .data file is modified. This mode can be set by specifying TRUE with this command. In this this mode, no backup is performed at CHECKPIONT or SHUTDOWN. This mode is prefered for large databases which are opened and closed frequently.
SET DATABASE REFERENTIAL INTEGRITY
set database referential integrity statement
<set database referential integrity statement> ::=
SET DATABASE REFERENTIAL INTEGRITY {TRUE | FALSE}
This commands enables or disables the enforcement of referential integrity constraints (foreign key constraints). By default, referential integrity constraints are checked.
The only legitimate use of this statement is before importing large amounts of external data into tables that have existing FOREIGN KEY constraints. After importing the data, queries must be run to verify all rows conform to the FOREIGN KEY constraints and take appropriate actions for the rows that do not conform. After verification, the statement must be used again to enable constraint enforcement.
Only a user with the DBA role can execute this statement.
SET DATABASE TRANSACTION CONTROL
set database transaction control
<set database transaction control statement> ::=
SET DATABASE TRANSACTION CONTROL { LOCKS | MVCC }
Set the concurrency control system for the database. It can be issued only when all sessions have been committed or rolled back.
Only a user with the DBA role can execute this statement.
SHUTDOWN
shutdown statement
<shutdown statement> ::= SHUTDOWN [IMMEDIATELY |
COMPACT | SCRIPT]
Shutdown the database. If the optional qualifier is not used, a normal SHUTDOWN is performed.
|
SHUTDOWN IMMEDIATELY |
Saves the *.log file and closes the database files. This is the quickest form of shutdown. This command should not be used as the routine method of closing the database, because when the database is accessed next time, it may take a long time to start. |
|
SHUTDOWN COMPACT |
This is similar to normal SHUTDOWN, but reduces the *.data file to its minimum size. It takes longer than normal SHUTDOWN. |
|
SHUTDOWN SCRIPT |
This is similar to SHUTDOWN COMPACT, but it does not rewrite
the This command produces a full script of the database which can be edited for special purposes prior to the next startup. |
Only a user with the DBA role can execute this statement.
*
*
<* statement> ::=
**.
*
*
**.
$Revision: 2959 $