Chapter 2. SQL Issues

Fred Toussi

HSQLDB Development Group

<ft@cluedup.com>

Copyright 2002-2005 Fred Toussi. Permission is granted to distribute this document without any alteration under the terms of the HSQLDB license. Additional permission is granted to the HSQLDB Development Group to distribute this document with or without alterations under the terms of the HSQLDB license.

$Date: 2005/07/01 17:06:32 $

Purpose

Many questions repeatedly asked in Forums and mailing lists are answered in this guide. If you want to use HSQLDB with your application, you should read this guide.

SQL Standard Support

HSQLDB 1.8.0 supports the dialect of SQL defined by SQL standards 92, 99 and 2003. This means where a feature of the standard is supported, e.g. left outer join, the syntax is that specified by the standard text. Many features of SQL92 and 99 up to Advanced Level are supported and there is support for most of SQL 2003 Foundation and several optional features of this standard. However, certain features of the Standards are not supported so no claim is made for full support of any level of the standards.

The SQL Syntax chapter of this guide SQL Syntax lists all the keywords and syntax that is supported. When writing or converting existing SQL DDL (Data Definition Language) and DML (Data Manipulation Language) statements for HSQLDB, you should consult the supported syntax and modify the statements accordingly.

Several words are reserved by the standard and cannot be used as table or column names. For example, the word POSITION is reserved as it is a function defined by the Standards with a similar role as String.indexOf() in Java. HSQLDB does not currently prevent you from using a reserved word if it does not support its use or can distinguish it. For example BEGIN is a reserved words that is not currently supported by HSQLDB and is allowed as a table or column name. You should avoid the use of such words as future versions of HSQLDB are likely to support the words and will reject your table definitions or queries. The full list of SQL reserved words is in the source of the org.hsqldb.Token class.

HSQLDB also supports some keywords and expressions that are not part of the SQL standard as enhancements. Expressions such as SELECT TOP 5 FROM .., SELECT LIMIT 0 10 FROM ... or DROP TABLE mytable IF EXISTS are among such constructs.

All keywords, can be used for database objects if they are double quoted.

Constraints and Indexes

Primary Key Constraints

Before 1.7.0, aCONSTRAINT <name> PRIMARY KEY was translated internally to a unique index and, in addition, a hidden column was added to the table with an extra unique index. From 1.7.0 both single-column and multi-column PRIMARY KEY constraints are supported. They are supported by a unique index on the primary key column(s) specified and no extra hidden column is maintained for these indexes.

Unique Constraints

According to the SQL standards, a unique constraint on a single column means no two values are equal unless one of them is NULL. This means you can have one or more rows where the column value is NULL.

A unique constraint on multiple columns (c1, c2, c3, ..) means that no two sets of values for the columns are equal unless at lease one of them is NULL. Each single column taken by itself can have repeat values. The following example satisfies a UNIQUE constraint on the two columns:

Example 2.1. Column values which satisfy a 2-column UNIQUE constraint

1,	2
2,	1
2,	2
NULL,	1
NULL,	1
1,	NULL
NULL,	NULL
NULL,	NULL

Since version 1.7.2 the behaviour of UNIQUE constraints and indexes with respect to NULL values has changed to conform to SQL standards. A row, in which the value for any of the UNIQUE constraint columns is NULL, can always be added to the table. So multiple rows can contain the same values for the UNIQUE columns if one of the values is NULL.

Unique Indexes

In 1.8.0, user defined UNIQUE indexes can still be declared but they are deprecated. You should use a UNIQUE constraint instead.

CONSTRAINT <name> UNIQUE always creates internally a unique index on the columns, as with previous versions, so it has exactly the same effect as the deprecated UNIQUE index declaration.

FOREIGN KEYS

From version 1.7.0, HSQLDB features single and multiple column foreign keys. A foreign key can also be specified to reference a target table without naming the target column(s). In this case the primary key column(s) of the target table is used as the referenced column(s). Each pair of referencing and referenced columns in any foreign key should be of identical type. When a foreign key is declared, a unique constraint (or primary key) must exist on the referenced columns in the primary key table. A non-unique index is automatically created on the referencing columns. For example:

    CREATE TABLE child(c1 INTEGER, c2 VARCHAR, FOREIGN KEY (c1, c2) REFERENCES parent(p1, p2));

There must be a UNIQUE constraint on columns (p1,p2) in the table named "parent". A non-unique index is automatically created on columns (c1, c2) in the table named "child". Columns p1 and c1 must be of the same type (INTEGER). Columns p2 and c2 must be of the same type (VARCHAR).

Indexes and Query Speed

HSQLDB does not use indexes to improve sorting of query results. But indexes have a crucial role in improving query speed. If no index is used in a query on a single table, such as a DELETE query, then all the rows of the table must be examined. With an index on one of the columns that is in the WHERE clause, it is often possible to start directly from the first candidate row and reduce the number of rows that are examined.

Indexes are even more important in joins between multiple tables. SELECT ... FROM t1 JOIN t2 ON t1.c1 = t2.c2 is performed by taking rows of t1 one by one and finding a matching row in t2. If there is no index index on t2.c2 then for each row of t1, all the rows of t2 must be checked. Whereas with an index, a matching row can be found in a fraction of the time. If the query also has a condition on t1, e.g., SELECT ... FROM t1 JOIN t2 ON t1.c1 = t2.c2 WHERE t1.c3 = 4 then an index on t1.c3 would eliminate the need for checking all the rows of t1 one by one, and will reduce query time to less than a millisecond per returned row. So if t1 and t2 each contain 10,000 rows, the query without indexes involves checking 100,000,000 row combinations. With an index on t2.c2, this is reduced to 10,000 row checks and index lookups. With the additional index on t2.c2, only about 4 rows are checked to get the first result row.

Indexes are automatically created for primary key and unique columns. Otherwise you should define an index using the CREATE INDEX command.

Note that in HSQLDB a unique index on multiple columns can be used internally as a non-unique index on the first column in the list. For example: CONSTRAINT name1 UNIQUE (c1, c2, c3); means there is the equivalent of CREATE INDEX name2 ON atable(c1);. So you do not need to specify an extra index if you require one on the first column of the list.

In 1.8.0, a multi-column index will speed up queries that contain joins or values on ALL the columns. You need NOT declare additional individual indexes on those columns unless you use queries that search only on a subset of the columns. For example, rows of a table that has a PRIMARY KEY or UNIQUE constraint on three columns or simply an ordinary index on those columns can be found efficiently when values for all three columns are specified in the WHERE clause. For example, SELECT ... FROM t1 WHERE t1.c1 = 4 AND t1.c2 = 6 AND t1.c3 = 8will use an index on t1(c1,c2,c3) if it exists.

As a result of the improvements to multiple key indexes, the order of declared columns of the index or constraint has less affect on the speed of searches than before. If the column that contains more diverse values appears first, the searches will be slightly faster.

A multi-column index will not speed up queries on the second or third column only. The first column must be specified in the JOIN .. ON or WHERE conditions.

Query speed depends a lot on the order of the tables in the JOIN .. ON or FROM clauses. For example the second query below should be faster with large tables (provided there is an index on TB.COL3). The reason is that TB.COL3 can be evaluated very quickly if it applies to the first table (and there is an index on TB.COL3):

    (TB is a very large table with only a few rows where TB.COL3 = 4)

    SELECT * FROM TA JOIN TB ON TA.COL1 = TB.COL2 AND TB.COL3 = 4;

    SELECT * FROM TB JOIN TA ON TA.COL1 = TB.COL2 AND TB.COL3 = 4;

The general rule is to put first the table that has a narrowing condition on one of its columns.

1.7.3 features automatic, on-the-fly indexes for views and subselects that are used in a query. An index is added to a view when it is joined to a table or another view.

Where Condition or Join

Using WHERE conditions to join tables is likely to reduce execution speed. For example the following query will generally be slow, even with indexes:

    SELECT ... FROM TA, TB, TC WHERE TC.COL3 = TA.COL1 AND TC.COL3=TB.COL2 AND TC.COL4 = 1

The query implies TA.COL1 = TB.COL2 but does not explicitly set this condition. If TA and TB each contain 100 rows, 10000 combinations will be joined with TC to apply the column conditions, even though there may be indexes on the joined columns. With the JOIN keyword, the TA.COL1 = TB.COL2 condition has to be explicit and will narrow down the combination of TA and TB rows before they are joined with TC, resulting in much faster execution with larger tables:

    SELECT ... FROM TA JOIN TB ON TA.COL1 = TB.COL2 JOIN TC ON TB.COL2 = TC.COL3 WHERE TC.COL4 = 1

The query can be speeded up a lot more if the order of tables in joins are changed, so that TC.COL1 = 1 is applied first and a smaller set of rows are joined together:

    SELECT ... FROM TC JOIN TB ON TC.COL3 = TB.COL2 JOIN TA ON TC.COL3 = TA.COL1 WHERE TC.COL4 = 1

In the above example the engine automatically applies TC.COL4 = 1 to TC and joins only the set of rows that satisfy this condition with other tables. Indexes on TC.COL4, TB.COL2 and TA.COL1 will be used if present and will speed up the query.

Subqueries and Joins

Using joins and setting up the order of tables for maximum performance applies to all areas. For example, the second query below should generally be much faster if there are indexes on TA.COL1 and TB.COL3:

Example 2.2. Query comparison

    SELECT ... FROM TA WHERE TA.COL1 = (SELECT MAX(TB.COL2) FROM TB WHERE TB.COL3 = 4)

    SELECT ... FROM (SELECT MAX(TB.COL2) C1 FROM TB WHERE TB.COL3 = 4) T2 JOIN TA ON TA.COL1 = T2.C1

The second query turns MAX(TB.COL2) into a single row table then joins it with TA. With an index on TA.COL1, this will be very fast. The first query will test each row in TA and evaluate MAX(TB.COL2) again and again.

Types and Arithmetic Operations

Table columns of all types supported by HSQLDB can be indexed and can feature in comparisons. Types can be explicitly converted using the CONVERT() library function, but in most cases they are converted automatically. It is recommended not to use indexes on LONGVARBINARY, LONGVARCHAR and OTHER columns, as these indexes will probably not be allowed in future versions.

Previous versions of HSQLDB featured poor handling of arithmetic operations. For example, it was not possible to insert 10/2.5 into any DOUBLE or DECIMAL column. Since 1.7.0, full operations are possible with the following rules:

TINYINT, SMALLINT, INTEGER, BIGINT, NUMERIC and DECIMAL (without a decimal point) are supported integral types and map to byte, short, int, long and BigDecimal in Java. The SQL type dictates the maximum and minimum values that can be held in a field of each type. For example the value range for TINYINT is -128 to +127, although the actual Java type used for handling TINYINT is java.lang.Integer.

REAL, FLOAT, DOUBLE are all mapped to double in Java.

DECIMAL and NUMERIC are mapped to java.math.BigDecimal and can have very large numbers of digits.

Integral Types

TINYINT, SMALLINT, INTEGER, BIGINT, NUMERIC and DECIMAL (without a decimal point) are fully interchangeable internally, and no data narrowing takes place. Depending on the types of the operands, the result of the operations is returned in a JDBC ResultSet in any of related Java types: Integer, Long or BigDecimal. The ResultSet.getXXXX() methods can be used to retrieve the values so long as the returned value can be represented by the resulting type. This type is deterministically based on the query, not on the actual rows returned. The type does not change when the same query that returned one row, returns many rows as a result of adding more data to the tables.

If the SELECT statement refers to a simple column or function, then the return type is the type corresponding to the column or the return type of the function. For example:

    CREATE TABLE t(a INTEGER, b BIGINT); SELECT MAX(a), MAX(b) FROM t;

would return a result set where the type of the first column is java.lang.Integer and the second column is java.lang.Long. However,

    SELECT MAX(a) + 1, MAX(b) + 1 FROM t;

would return java.lang.Long and BigDecimal values, generated as a result of uniform type promotion for all the return values.

There is no built-in limit on the size of intermediate integral values in expressions. As a result, you should check for the type of the ResultSet column and choose an appropriate getXXXX() method to retrieve it. Alternatively, you can use the getObject() method, then cast the result to java.lang.Number and use the intValue() or longValue() methods on the result.

When the result of an expression is stored in a column of a database table, it has to fit in the target column, otherwise an error is returned. For example when 1234567890123456789012 / 12345687901234567890 is evaluated, the result can be stored in any integral type column, even a TINYINT column, as it is a small value.

Other Numeric Types

In SQL statements, numbers with a decimal point are treated as DECIMAL unless they are written with an exponent. Thus 0.2 is considered a DECIMAL value but 0.2E0 is considered a DOUBLE value.

When PreparedStatement.setDouble() or setFloat() is used, the value is treated as a DOUBLE automatically.

When a REAL, FLOAT or DOUBLE (all synonymous) is part of an expression, the type of the result is DOUBLE.

Otherwise, when no DOUBLE value exists, if a DECIMAL or NUMERIC value is part an expression, the type of the result is DECIMAL. The result can be retrieved from a ResultSet in the required type so long as it can be represented. This means DECIMAL values can be converted to DOUBLE unless they are beyond the Double.MIN_VALUE - Double.MAX_VALUE range. Similar to integral values, when the result of an expression is stored in a table column, it has to fit in the target column, otherwise an error is returned.

The distinction between DOUBLE and DECIMAL is important when a division takes place. When the terms are DECIMAL, the result is a value with a scale (number of digits to the right of the decimal point) equal to the larger of the scales of the two terms. With a DOUBLE term, the scale will reflect the actual result of the operation. For example, 10.0/8.0 (DECIMAL) equals 1.2 but 10.0E0/8.0E0 (DOUBLE) equals 1.25. Without division operations, DECIMAL values represent exact arithmetic; the resulting scale is the sum of the scales of the two terms when multiplication is performed.

REAL, FLOAT and DOUBLE values are all stored in the database as java.lang.Double objects. Special values such as NaN and +-Infinity are also stored and supported. These values can be submitted to the database via JDBC PreparedStatement methods and are returned in ResultSet objects.

Bit and Boolean Types

Since 1.7.2, BIT is simply an alias for BOOLEAN. The primary representation of BOOLEAN column is 'true' or 'false' either as the boolean type or as strings when used from JDBC. This type of column can also be initialised using values of any numeric type. In this case 0 is translated to false and any other value such as 1 is translated to true.

Since 1.7.3 the BOOLEAN type conforms to the SQL standards and supports the UNDEFINED state in addition to TRUE or FALSE. NULL values are treated as undefined. This improvement affects queries that contain NOT IN. See the test text file, TestSelfNot.txt, for examples of the queries.

Storage and Handling of Java Objects

Since version 1.7.2 this support has improved and any serializable JAVA Object can be inserted directly into a column of type OTHER using any variation of PreparedStatement.setObject() methods.

For comparison purposes and in indexes, any two Java Objects are considered equal unless one of them is NULL. You cannot search for a specific object or perform a join on a column of type OTHER.

Please note that HSQLDB is not an object-relational database. Java Objects can simply be stored internally and no operations should be performed on them other than assignment between columns of type OTHER or tests for NULL. Tests such as WHERE object1 = object2, or WHERE object1 = ? do not mean what you might expect, as any non-null object would satisfy such a tests. But WHERE object1 IS NOT NULL is perfectly acceptable.

The engine does not return errors when normal column values are assigned to Java Object columns (for example assigning an INTEGER or STRING to such a column with an SQL statement such as UPDATE mytable SET objectcol = intcol WHERE ...) but this is highly likely to be disallowed in future. So please use columns of type OTHER only to store your objects and nothing else.

Type Size, Precision and Scale

Prior to 1.7.2, all table column type definitions with a column size, precision or scale qualifier were accepted and ignored.

In 1.8.0, such qualifiers must conform to the SQL standards. For example INTEGER(8) is no longer acceptable. The qualifiers are still ignored unless you set a database property. SET PROPERTY "sql.enforce_strict_size" TRUEwill enforce sizes for CHARACTER or VARCHAR columns and pad any strings when inserting or updating a CHARACTER column. The precision and scale qualifiers are also enforced for DECIMAL and NUMERIC types. TIMESTAMP can be used with a precision of 0 or 6 only.

Casting a value to a qualified CHARACTER type will result in truncation or padding as you would expect. So a test such as CAST (mycol AS VARCHAR(2)) = 'xy' will find the values beginning with 'xy'. This is the equivalent of SUBSTRING(mycol FROM 1 FOR 2) = 'xy'.

Sequences and Identity

The SEQUENCE keyword was introduced in 1.7.2 with a subset of the SQL 200n standard syntax. Corresponding SQL 200n syntax for IDENTITY columns has also been introduced.

Identity Auto-Increment Columns

Each table can contain one auto-increment column, known as the IDENTITY column. An IDENTITY column is always treated as the primary key for the table (as a result, multi-column primary keys are not possible with an IDENTITY column present). Support has been added for CREATE TABLE <tablename>(<colname> IDENTITY, ...) as a shortcut.

Since 1.7.2, the SQL standard syntax is used by default, which allows the initial value to be specified. The supported form is(<colname> INTEGER GENERATED BY DEFAULT AS IDENTITY(START WITH n, [INCREMENT BY m])PRIMARY KEY, ...). Support has also been added for BIGINT identity columns. As a result, an IDENTITY column is simply an INTEGER or BIGINT column with its default value generated by a sequence generator.

When you add a new row to such a table using an INSERT INTO <tablename> ...;statement, you can use the NULL value for the IDENTITY column, which results in an auto-generated value for the column. The IDENTITY() function returns the last value inserted into any IDENTITY column by this connection. Use CALL IDENTITY(); as an SQL statement to retrieve this value. If you want to use the value for a field in a child table, you can use INSERT INTO <childtable> VALUES (...,IDENTITY(),...);. Both types of call toIDENTITY() must be made before any additional update or insert statements are issued on the database.

The next IDENTITY value to be used can be set with the

ALTER TABLE ALTER COLUMN <column name> RESTART WITH <new value>;

Sequences

The SQL 200n syntax and usage is different from what is supported by many existing database engines. Sequences are created with the CREATE SEQUENCE command and their current value can be modified at any time with ALTER SEQUENCE. The next value for a sequence is retrieved with the NEXT VALUE FOR <name> expression. This expression can be used for inserting and updating table rows. You can also use it in select statements. For example, if you want to number the returned rows of a SELECT in sequential order, you can use:

Example 2.3. Numbering returned rows of a SELECT in sequential order

    SELECT NEXT VALUE FOR mysequence, col1, col2 FROM mytable WHERE ...

Please note that the semantics of sequences is not exactly the same as defined by SQL 200n. For example if you use the same sequence twice in the same row insert query, you will get two different values, not the same value as required by the standard.

You can query the SYSTEM_SEQUENCES table for the next value that will be returned from any of the defined sequences. The SEQUENCE_NAME column contains the name and the NEXT_VALUE column contains the next value to be returned.

Issues with Transactions

HSQLDB supports transactions at the READ_UNCOMMITTED level, also known as level 0 transaction isolation. This means that during the lifetime of a transaction, other connections to the database can see the changes made to the data. Transaction support works well in general. Reported bugs concerning transactions being committed if the database is abruptly closed have been fixed. However, the following issues may be encountered only with multiple connections to a database using transactions:

If two transactions modify the same row, no exception is raised when both transactions are committed. This can be avoided by designing your database in such a way that application data consistency does not depend on exclusive modification of data by one transaction. You can set a database property to cause an exception when this happens.

SET PROPERTY "sql.tx_no_multi_rewrite" TRUE

When an ALTER TABLE .. INSERT COLUMN or DROP COLUMN command results in changes to the table structure, the current session is committed. If an uncommitted transaction started by another connections has changed the data in the affected table, it may not be possible to roll it back after the ALTER TABLE command. This may also apply to ADD INDEX or ADD CONSTRAINT commands. It is recommended to use these ALTER commands only when it is known that other connections are not using transactions.

After a CHECKPOINT command is issued, uncommitted transactions can be continued, committed, or rolled back. However, if the database is not subsequently closed properly with the SHUTDOWN command, any such transaction that still remains uncommitted at the time of shutdown, is part committed (to the state at CHECKPOINT) at the next startup. It is recommended to use the CHECKPOINT command either when no uncommitted transactions is in progress, or it is known that any such transaction is not likely to last for such a long time that an abnormal shutdown might affect its data.

New Features and Changes

In recent versions leading to 1.8.0 many enhancements were made for better SQL support. These are listed in the SQL Syntax chapter, in ../changelog_1_8_0.txt and ../changelog_1_7_2.txt. Functions and expressions such as POSITION(), SUBSTRING(), NULLIF(), COALESCE(), CASE ... WHEN .. ELSE, ANY, ALL etc. are among them. Other enhancements may not be very obvious in the documentation but can result in changes of behaviour from previous versions. Most significant among these are handling of NULL values in joins (null columns are no longer joined) and OUTER joins (the results are now correct). You should test your applications with the new version to ensure they do not rely on past incorrect behaviour of the engine. The engine will continue to evolve in future versions towards full SQL standard support, so it is best not to rely on any non-standard feature of the current version.


Chapter 1. Running and Using Hsqldb		Chapter 3. UNIX Quick Start