CREATE TABLE

Documentation

VoltDB Home » Documentation » Using VoltDB

CREATE TABLE

CREATE TABLE — Creates a table in the database.

Synopsis

CREATE TABLE table-name

[ export-definition | topic-definition | migrate-target-definition | migrate-topic-definition ]

column-definition [,...]
[, constraint-definition [,...]]

) [ttl-definition] ;

export-definition: EXPORT TO TARGET target-name [ON action [,...]]

topic-definition: EXPORT TO TOPIC topic-name [ON action [,...]]
[ WITH [KEY (column-name [,...])] [VALUE (column-name [,...])]]

migrate-target-definition: MIGRATE TO TARGET target-name

migrate-topic-definition: MIGRATE TO TOPIC topic-name
[ WITH [KEY (column-name [,...])] [VALUE (column-name [,...])]]

column-definition: column-name datatype [DEFAULT value ] [ NOT NULL ] [index-type]

constraint-definition: [CONSTRAINT constraint-name] { index-definition }

index-definition: {index-type} (column-name [,...])

index-type: PRIMARY KEY | UNIQUE | ASSUMEUNIQUE

ttl-definition: USING TTL value [time-unit] ON COLUMN column-name
[BATCH_SIZE number-of-rows] [MAX_FREQUENCY value]

time-unit: SECONDS | MINUTES | HOURS | DAYS

Description

The CREATE TABLE statement creates a table and its associated columns in the database. The supported datatypes are described in Table A.1, “Supported SQL Datatypes”.

Table A.1. Supported SQL Datatypes

SQL DatatypeEquivalent Java DatatypeDescription
TINYINTbyte1-byte signed integer, -127 to 127[a]
SMALLINTshort2-byte signed integer, -32,767 to 32,767
INTEGERint4-byte signed integer, -2,147,483,647 to 2,147,483,647
BIGINTlong8-byte signed integer, -9,223,372,036,854,775,807 to 9,223,372,036,854,775,807
FLOATdouble8-byte numeric, -(2-2-52)·21023 to (2-2-52)·21023 (Note that values less than or equal to -1.7E+308 are interpreted as null.)
DECIMALBigDecimal16-byte fixed scale of 12 and precision of 38, -99999999999999999999999999.999999999999 to 99999999999999999999999999.999999999999
GEOGRAPHY or GEOGRAPHY() A geospatial region. The storage requirement for geospatial data varies depending on the geometry. The default maximum size in memory is 32768. However, you can specify a different value by specifying the maximum size (in bytes) in the declaration. For example: GEOGRAPHY(80000). See the section on entering geospatial data in the VoltDB Guide to Performance and Customization for details.
GEOGRAPHY_POINT A geospatial location identified by its latitude and longitude. Requires 16 bytes of storage.
VARCHAR()StringVariable length text string, with a maximum length specified in either characters (the default) or bytes. To specify the length in bytes, use the BYTES keyword after the length value. For example: VARCHAR(28 BYTES).
VARBINARY()byte arrayVariable length binary string (sometimes referred to as a "blob") with a maximum length specified in bytes
TIMESTAMPlong, VoltDB TimestampTypeTime in microseconds

[a] For integer and floating-point datatypes, VoltDB reserves the largest possible negative value to denote a null value. For example -128 is interpreted as null for TINYINT, -32768 for SMALLINT, and so on.


The following limitations are important to note when using the CREATE TABLE statement in VoltDB:

  • CHECK and FOREIGN KEY constraints are not supported.

  • VoltDB does not support AUTO_INCREMENT, the automatic incrementing of column values.

  • A table can have up to 1024 columns. Each column has a maximum size of 1 megabyte and the total declared size of all of the columns in a table cannot exceed 2 megabytes. For VARCHAR columns where the length is specified in characters, the declared size is calculated as 4 bytes per character to allow for the longest potential UTF-8 string.

  • If you intend to use a column to partition a table, that column cannot contain null values. You must specify NOT NULL in the definition of the column or VoltDB issues an error when compiling the schema.

  • To specify an index — either for an individual column or as a table constraint — that is globally unique across the database, use the standard SQL keywords UNIQUE and PRIMARY KEY. However, for partitioned tables, VoltDB can only ensure uniqueness if the index includes the partitioning column. Otherwise, these keywords are not allowed.

    It can be a performance advantage to define indexes or constraints on non-partitioning columns that you, as the developer, know are going to contain unique values. Although VoltDB cannot ensure uniqueness across the entire database, it does allow you to define indexes that are assumed to be unique by using the ASSUMEUNIQUE keyword.

    When you define an index on a partitioned table as ASSUMEUNIQUE, VoltDB verifies uniqueness within the current partition when creating an index entry. However, it is your responsibility as developer or administrator to ensure that the values are actually globally unique. If the database is repartitioned due to adding new nodes or restoring a snapshot to a different cluster configuration, non-unique ASSUMEUNIQUE index entries may collide. When this occurs it results in a constraint violation error and the database will not be able to complete its current action.

    Therefore, ASSUMEUNIQUE should be used with caution. Also, it is not necessary and should not be used with replicated tables or indexes that contain the partitioning column, which can be defined as UNIQUE.

  • EXPORT TO TARGET and EXPORT TO TOPIC allow you to connect a table to either an export target or a topic, so that by default data written into the table is also sent to the topic or the export connector for delivery to the specified target. By default, only insert operations ( INSERT and UPSERT when it inserts a new row) initiate export records. However, you can use the ON clause to specify which actions you want to trigger export. For example, the following table declaration generates export records whenever rows are created or modified.

    CREATE TABLE RESERVATION
       EXPORT TO TARGET airlines ON INSERT, UPDATE_NEW
         (reserv_id INT NOT NULL,
          flight_id INT NOT NULL,
          . . . );

    The following table defines the actions that you can specify in the ON clause.

    KeywordDescription
    INSERTContents of new record from INSERT, or UPSERT that creates new record
    DELETEContents of a record that is deleted
    UPDATE_OLDContents of a record before it is updated
    UPDATE_NEWContents of a record after it is updated
    UPDATETwo records are exported, the contents before and after a record is updated (shorthand equivalent for specifying both UPDATE_OLD and UPDATE_NEW)
  • EXPORT TO TOPIC (and MIGRATE TO TOPIC) also allow you to specify which column values are used as the topic key and which as the body of the topic message using the WITH KEY/VALUE clause. See the description of topic streams in the description of CREATE STREAM for more information.

  • The length of VARCHAR columns can be specified in either characters (the default) or bytes. To specify the length in bytes, include the BYTES keyword after the length value; for example VARCHAR(16 BYTES).

    Specifying the VARCHAR length in characters is recommended because UTF-8 characters can require a variable number of bytes to store. By specifying the length in characters you can be sure the column has sufficient space to store any string of the specified length. Specifying the length in bytes is only recommended when all values contain only single byte (ASCII) characters or when conserving space is required and the strings are less than 64 bytes in length.

  • The VARBINARY datatype provides variable storage for arbitrary strings of binary data and operates similarly to VARCHAR(n BYTES) strings. You assign byte arrays to a VARBINARY column when passing in variables, or you can use a hexidecimal string for assigning literal values in the SQL statement.

  • The VoltDB TIMESTAMP datatype is a long integer representing the number of microseconds since the epoch. Two important points to note about this timestamp:

    • The VoltDB TIMESTAMP is not the same as the Java Timestamp datatype or traditional Linux time measurements, which are measured in milliseconds rather than microseconds. Appropriate conversion is needed when casting values between a VoltDB TIMESTAMP and other timestamp datatypes.

    • The VoltDB TIMESTAMP is interpreted as a Greenwich Meantime (GMT) value. Depending on how time values are created, their value may or may not account for the local machine's default time zone. Mixing timestamps from different time zones (for example, in WHERE clause comparisons) can result in unexpected behavior.

  • For TIMESTAMP columns, you can define a default value using the NOW or CURRENT_TIMESTAMP keywords in place of a specific value. For example:

    CREATE TABLE Event (
        Event_Id INTEGER UNIQUE NOT NULL,
        Event_Timestamp TIMESTAMP DEFAULT NOW,
        Event_Description VARCHAR(128)
    );

    The default value is evaluated at runtime as an approximation, in milliseconds, of when the transaction begins execution.

Automatic Aging and Data Migration

When you define a database table you can also define a "time to live" (TTL) when records in the table expire and are automatically deleted. The USING TTL clause specifies a lifetime for each record, based on the difference between the specified TTL value, the value of the specified column, and the current time (in GMT microseconds). In the simplest case, you can define a time to live based on a TIMESTAMP column defined as DEFAULT NOW, so the record expires the specified amount of time after it is inserted. For example, the records in the following table will be deleted five minutes after they are inserted into the database (assuming the default value is used for the created column):

CREATE TABLE current_alerts (
   id BIGINT NOT NULL,
   message VARCHAR(128),
   created TIMESTAMP DEFAULT NOW NOT NULL,
) USING TTL 5 MINUTES ON COLUMN created;

You specify the time to live value as an integer number of seconds, minutes, hours, or days. (The default, if you do not specify a time unit, is seconds.) The TTL column must be declared as a TIMESTAMP and NOT NULL.

TTL records are evaluated and deleted by a parallel process within the database. As a result, records are deleted shortly after the specified time to live arrives, rather than at the exact time specified. But the deletion of records is handled as a proper database transaction, guaranteeing consistency with any user-invoked transactions. One consequence of automating the expiration of database records, is that the evaluation and deletion of records produces additional transactions that may impact database performance.

When you define an expiration time for database records, you can also specify an export target or a topic using MIGRATE TO TARGET or MIGRATE TO TOPIC. If you specify both USING TTL and MIGRATE TO TARGET, before the data is deleted by the TTL process, the data is migrated — through the specified export connector or topic — to the target location. The combination of TTL and data migration creates an automated archiving process, where aged data is moved to another repository or stream while VoltDB continues to operate on current data. VoltDB does not delete the records until after the target system acknowledges their receipt; that is, the export connector successfully sends the data to the remote target or the data is inserted into the specified output topic. In this way you are assured that the data is always present in at least one of the participating systems.

For example, the following table definition establishes an automatic archiving policy that removes sessions with no activity for an hour, migrating old records to a historical repository:

CREATE TABLE sessions  
    MIGRATE TO TARGET oldsessions
 (
   login TIMESTAMP DEFAULT NOW,
   last_update TIMESTAMP NOT NULL,
   user_id BIGINT NOT NULL
 ) USING TTL 1 HOURS ON COLUMN last_update;

It is also possible to migrate data manually. If you add the MIGRATE TO TARGET or MIGRATE TO TOPIC clause by itself, without USING TTL, no data is automatically migrated. However, you can explicitly initiate migration by invoking the MIGRATE SQL statement with the WHERE clause to specify which rows are migrated. Use of MIGRATE TO TARGET or MIGRATE TO TOPIC without USING TTL is useful when the application logic to select what data to migrate requires multiple or non-numeric variables. For example, if the schedule for archiving a record varies based on which user created it:

CREATE TABLE messages
  MIGRATE TO TARGET oldmessages
 (
   posted TIMESTAMP DEFAULT NOW,
   message_text VARCHAR(128),
   user_id BIGINT NOT NULL,
   user_type VARCHAR(5) NOT NULL
 );

In this case, no data is migrated until you explicitly initiate migration with the MIGRATE statement:

MIGRATE FROM messages
  WHERE 
    ( (posted < DATEADD(DAY,-3,NOW()) AND user_type='USER') 
      OR  (posted < DATEADD(DAY,-14,NOW()) AND user_type='ADMIN')
    ) AND NOT MIGRATING;

You can also migrate data manually, even if the table declaration includes the USING TTL clause. In this case you can use MIGRATE to preemptively migrate data before the TTL column expires. For example, using the sessions table defined above, you might want to migrate all sessions for a user when their account is deleted:

MIGRATE FROM sessions WHERE user_id=? AND NOT MIGRATING;

Note that use of the MIGRATING function is not required to filter on rows that are not already migrating, because the MIGRATE statement will not initiate export if rows are already migrating. However, explicitly including AND NOT MIGRATING in your MIGRATE statement can improve performance.

The MIGRATING function is also useful so you can avoid accidentally modifying records that are already marked for deletion, especially since any changes to migrating records will cancel the delete operation but not the export. For example, if you want to update the last_update column of a user's records but only if they are not already being migrated, your UPDATE statement should include NOT MIGRATING:

UPDATE sessions SET last_update=NOW() WHERE user_id=? AND NOT MIGRATING;

Time to live and data migration are powerful concepts. However, there are some important details to consider when using these features:

  • There must be a usable index on the TTL column for the table. VoltDB uses that index to optimize the evaluation of the TTL values. If not, the USING TTL clause is accepted, but no automated deletion will occur at runtime until a usable index is defined.

  • The CREATE TABLE... USING TTL statement is not rejected if the index is missing. This way you can define the index in a subsequent DDL statement. However, a warning message is issued if the USING TTL clause has no supporting index available. A similar warning is issued if you delete the last usable index.

  • When the table definition includes both USING TTL and MIGRATE TO TARGET or MIGRATE TO TOPIC, there must be an index including the TTL column for the USING TTL clause and a separate index including only the TTL column and a WHERE NOT MIGRATING clause. This index is required to effectively find and schedule the migration of expired records. For example, the sessions table in the previous example would require the following index. If the index is not present, records for the table will neither be deleted nor migrated and a warning will be logged on the server:

    CREATE INDEX sessions_migrate_index ON sessions 
       (last_update) WHERE NOT MIGRATING;
  • TTL clauses are most effective when used on partitioned tables. Defining TTL for a replicated table, especially a large replicated table, can have a significant impact on database performance because the TTL delete actions must be processed as multi-partition transactions.

  • You can specify the frequency and maximum size of the TTL processing cycle.

    • The BATCH_SIZE argument specifies the maximum number of records that will be deleted during each processing cycle. Specify the batch size as a positive integer. The default is 1000 rows.

    • The MAX_FREQUENCY argument specifies how often the TTL clause is evaluated. You specify the frequency in terms of the maximum number of times it is processed per second. For example a MAX_FREQUENCY of 10 means that the table's TTL value is processed at most 10 times per second. Specify the frequency as a positive integer. The default frequency is once per second (1).

    Under extreme loads or sudden bursts of inserts, it is possible for TTL processing to fall behind. Or if the records are extremely large, attempting to delete too many records at one time can cause the TTL process to exceed the temporary table limit. The BATCH_SIZE and MAX_FREQUENCY clauses let you customize the TTL processing per table to meet the specific requirements of your application. The TTL selector for the @Statistics system procedure can help you evaluate TTL performance against your application workload to determine what settings you need.

  • Evaluation of the time to live is made against the current value of the TTL column, not its initial value. So if a subsequent transaction alters the column value (either increasing or decreasing it) that modification will impact the subsequent lifetime of the record.

  • When using database replication (DR), it is possible for the TTL transaction to exceed the 50MB limit on the DR binary log. If this happens, a warning is issued and TTL processing is suspended.

  • When using MIGRATION TO TARGET or MIGRATE TO TOPIC, there is an interval after the TTL value is triggered and before the record is successfully migrated and deleted from the VoltDB database. During this interval, the record is available for read access from SELECT queries. You can also update or delete the record; but modifying the record will cancel the pending delete. So if, for example, you update the record to extend the TTL column, the record will remain in the database until the new TTL column value is reached. However the update does not cancel the export of the original data to the specified target or topic that had already been triggered. So two records will eventually be migrated.

  • In most cases, you can ignore whether a record is currently being migrated and scheduled for delete or not. For example, if you delete a record that is currently being migrated, you cancel the pending delete but you delete the record anyway, so the results end up the same. However, if you do want to distinguish between currently active and currently migrating records, you can use the MIGRATING function, that identifies records that are currently "in flight". For example, to select records for a specific user ID and only those records that are not being migrated, you can use the following query:

    SELECT user_id, login FROM sessions WHERE user_id = ? AND NOT MIGRATING;

Example

The following example defines a table with five columns. The first column, Company, is not allowed to be null, which is important since it is used as the partitioning column in the following PARTITION TABLE statement. That column is also contained in the PRIMARY KEY constraint. Again, it is important to include the partitioning column in any fully unique indexes for partitioned tables.

CREATE TABLE Inventory (
    Company VARCHAR(32) NOT NULL,
    ProductID BIGINT NOT NULL, 
    Price DECIMAL,
    Category VARCHAR(32),
    Description VARCHAR(256),
    PRIMARY KEY (Company, ProductID)
);
PARTITION TABLE Inventory ON COLUMN Company;