Using VoltDB

Documentation

VoltDB Home » Documentation » Using VoltDB

Using VoltDB

V9.3.2

The text and illustrations in this document are licensed under the terms of the GNU Affero General Public License Version 3 as published by the Free Software Foundation. See the GNU Affero General Public License (http://www.gnu.org/licenses/) for more details.

Many of the core VoltDB database features described herein are part of the VoltDB Community Edition, which is licensed under the GNU Affero Public License 3 as published by the Free Software Foundation. Other features are specific to the VoltDB Enterprise Edition and VoltDB Pro, which are distributed by VoltDB, Inc. under a commercial license.

The VoltDB client libraries, for accessing VoltDB databases programmatically, are licensed separately under the MIT license.

Your rights to access and use VoltDB features described herein are defined by the license you received when you acquired the software.

VoltDB is a trademark of VoltDB, Inc.

VoltDB software is protected by U.S. Patent Nos. 9,600,514, 9,639,571, 10,067,999, 10,176,240, and 10,268,707. Other patents pending.

Abstract

This book explains how to use VoltDB to design, build, and run high performance applications.


Table of Contents

About This Book
1. Overview
1.1. What is VoltDB?
1.2. Who Should Use VoltDB
1.3. How VoltDB Works
1.3.1. Partitioning
1.3.2. Serialized (Single-Threaded) Processing
1.3.3. Partitioned vs. Replicated Tables
1.3.4. Ease of Scaling to Meet Application Needs
1.4. Working with VoltDB Effectively
2. Installing VoltDB
2.1. Operating System and Software Requirements
2.2. Installing VoltDB
2.2.1. Upgrading From Older Versions
2.2.2. Building a New VoltDB Distribution Kit
2.3. Setting Up Your Environment
2.4. What is Included in the VoltDB Distribution
2.5. VoltDB in Action: Running the Sample Applications
3. Starting the Database
3.1. Initializing and Starting a VoltDB Database
3.2. Initializing and Starting a VoltDB Database on a Cluster
3.3. Stopping a VoltDB Database
3.4. Saving the Data
3.5. Restarting a VoltDB Database
3.6. Updating Nodes on the Cluster
3.7. Defining the Cluster Configuration
3.7.1. Determining How Many Sites per Host
3.7.2. Configuring Paths for Runtime Features
3.7.3. Verifying your Hardware Configuration
4. Designing the Database Schema
4.1. How to Enter DDL Statements
4.2. Creating Tables and Primary Keys
4.3. Analyzing Data Volume and Workload
4.4. Partitioning Database Tables
4.4.1. Choosing a Column on which to Partition Table Rows
4.4.2. Specifying Partitioned Tables
4.4.3. Design Rules for Partitioning Tables
4.5. Replicating Database Tables
4.5.1. Choosing Replicated Tables
4.5.2. Specifying Replicated Tables
4.6. Modifying the Schema
4.6.1. Effects of Schema Changes on Data and Clients
4.6.2. Viewing the Schema
4.6.3. Modifying Tables
4.6.4. Adding and Dropping Indexes
4.6.5. Modifying Partitioning for Tables and Stored Procedures
5. Designing Stored Procedures to Access the Database
5.1. How Stored Procedures Work
5.1.1. VoltDB Stored Procedures are Transactional
5.1.2. VoltDB Stored Procedures are Deterministic
5.2. The Anatomy of a VoltDB Stored Procedure
5.2.1. The Structure of the Stored Procedure
5.2.2. Passing Arguments to a Stored Procedure
5.2.3. Creating and Executing SQL Queries in Stored Procedures
5.2.4. Interpreting the Results of SQL Queries
5.2.5. Returning Results from a Stored Procedure
5.2.6. Rolling Back a Transaction
5.3. Installing Stored Procedures into the Database
5.3.1. Compiling, Packaging, and Loading Stored Procedures
5.3.2. Declaring Stored Procedures in the Schema
5.3.3. Partitioning Stored Procedures in the Schema
6. Designing VoltDB Client Applications
6.1. Connecting to the VoltDB Database
6.1.1. Connecting to Multiple Servers
6.1.2. Using the Auto-Connecting Client
6.2. Invoking Stored Procedures
6.3. Invoking Stored Procedures Asynchronously
6.4. Closing the Connection
6.5. Handling Errors
6.5.1. Interpreting Execution Errors
6.5.2. Handling Timeouts
6.5.3. Writing a Status Listener to Interpret Other Errors
6.6. Compiling and Running Client Applications
6.6.1. Starting the Client Application
6.6.2. Running Clients from Outside the Cluster
7. Simplifying Application Development
7.1. Using Default Procedures
7.2. Shortcut for Defining Simple Stored Procedures
7.3. Verifying Expected Query Results
7.4. Scheduling Stored Procedures as Tasks
7.5. Directed Procedures: Distributing Transactions to Every Partition
8. Using VoltDB with Other Programming Languages
8.1. C++ Client Interface
8.1.1. Writing VoltDB Client Applications in C++
8.1.2. Creating a Connection to the Database Cluster
8.1.3. Invoking Stored Procedures
8.1.4. Invoking Stored Procedures Asynchronously
8.1.5. Interpreting the Results
8.2. JSON HTTP Interface
8.2.1. How the JSON Interface Works
8.2.2. Using the JSON Interface from Client Applications
8.2.3. How Parameters Are Interpreted
8.2.4. Interpreting the JSON Results
8.2.5. Error Handling using the JSON Interface
8.3. JDBC Interface
8.3.1. Using JDBC to Connect to a VoltDB Database
8.3.2. Using JDBC to Query a VoltDB Database
9. Using VoltDB in a Cluster
9.1. Starting a Database Cluster
9.2. Updating the Cluster Configuration
9.3. Elastic Scaling to Resize the Cluster
9.3.1. Adding Nodes with Elastic Scaling
9.3.2. Removing Nodes with Elastic Scaling
9.3.3. Configuring How VoltDB Rebalances Nodes During Elastic Scaling
10. Availability
10.1. How K-Safety Works
10.2. Enabling K-Safety
10.2.1. What Happens When You Enable K-Safety
10.2.2. Calculating the Appropriate Number of Nodes for K-Safety
10.3. Recovering from System Failures
10.3.1. What Happens When a Node Rejoins the Cluster
10.3.2. Where and When Recovery May Fail
10.4. Avoiding Network Partitions
10.4.1. K-Safety and Network Partitions
10.4.2. Using Network Fault Protection
11. Database Replication
11.1. How Database Replication Works
11.1.1. Starting Database Replication
11.1.2. Database Replication, Availability, and Disaster Recovery
11.1.3. Database Replication and Completeness
11.2. Using Passive Database Replication
11.2.1. Specifying the DR Tables in the Schema
11.2.2. Configuring the Clusters
11.2.3. Starting the Clusters
11.2.4. Loading the Schema and Starting Replication
11.2.5. Updating the Schema During Replication
11.2.6. Stopping Replication
11.2.7. Database Replication and Read-only Clients
11.3. Using Cross Datacenter Replication
11.3.1. Designing Your Schema for Active Replication
11.3.2. Configuring the Database Clusters
11.3.3. Starting the Database Clusters
11.3.4. Loading a Matching Schema and Starting Replication
11.3.5. Updating the Schema During Active Replication
11.3.6. Stopping Replication
11.3.7. Example XDCR Configurations
11.3.8. Understanding Conflict Resolution
11.4. Monitoring Database Replication
12. Security
12.1. How Security Works in VoltDB
12.2. Enabling Authentication and Authorization
12.3. Defining Users and Roles
12.4. Assigning Access to Stored Procedures
12.5. Assigning Access by Function (System Procedures, SQL Queries, and Default Procedures)
12.6. Using Built-in Roles
12.7. Encrypting VoltDB Communication Using TLS/SSL
12.7.1. Configuring TLS/SSL on the VoltDB Server
12.7.2. Choosing What Ports to Encrypt with TLS/SSL
12.7.3. Using the VoltDB Command Line Utilities with TLS/SSL
12.7.4. Implementing TLS/SSL in the Java Client Applications
12.7.5. Configuring Database Replication (DR) With TLS/SSL
12.8. Integrating Kerberos Security with VoltDB
12.8.1. Installing and Configuring Kerberos
12.8.2. Installing and Configuring the Java Security Extensions
12.8.3. Configuring the VoltDB Servers and Clients
12.8.4. Accessing the Database from the Command Line and the Web
13. Saving & Restoring a VoltDB Database
13.1. Performing a Manual Save and Restore of a VoltDB Cluster
13.1.1. How to Save the Contents of a VoltDB Database
13.1.2. How to Restore the Contents of a VoltDB Database Manually
13.1.3. Changing the Cluster Configuration Using Save and Restore
13.2. Scheduling Automated Snapshots
13.3. Managing Snapshots
13.4. Special Notes Concerning Save and Restore
14. Command Logging and Recovery
14.1. How Command Logging Works
14.2. Controlling Command Logging
14.3. Configuring Command Logging for Optimal Performance
14.3.1. Log Size
14.3.2. Log Frequency
14.3.3. Synchronous vs. Asynchronous Logging
14.3.4. Hardware Considerations
15. Streaming Data: Import, Export, and Migration
15.1. How Data Streaming Works in VoltDB
15.1.1. Understanding Import
15.1.2. Understanding Export
15.1.3. Understanding Migration
15.2. The Business Case for Streaming Data
15.2.1. Extract, Transform, Load (ETL)
15.2.2. Change Data Capture
15.2.3. Streaming Data Validation
15.2.4. Caching
15.2.5. Archiving
15.3. VoltDB Export Connectors
15.3.1. How Export Works
15.3.2. The File Export Connector
15.3.3. The HTTP Export Connector
15.3.4. The JDBC Export Connector
15.3.5. The Kafka Export Connector
15.3.6. The Elasticsearch Export Connector
15.4. VoltDB Import Connectors
15.4.1. Bulk Loading Data Using VoltDB Standalone Utilities
15.4.2. Streaming Import Using Built-in Import Features
15.4.3. The Kafka Importer
15.4.4. The Kinesis Importer
15.5. VoltDB Import Formatters
A. Supported SQL DDL Statements
ALTER STREAM — Modifies an existing stream definition.
ALTER TABLE — Modifies an existing table definition.
ALTER TASK — Modifies an existing task schedule.
CREATE AGGREGATE FUNCTION — Defines an aggregate SQL function and associates it with a Java class.
CREATE FUNCTION — Defines a SQL scalar function and associates it with a Java method.
CREATE INDEX — Creates an index for faster access to a table.
CREATE PROCEDURE AS — Defines a stored procedure composed of one or more SQL statements.
CREATE PROCEDURE FROM CLASS — Defines a stored procedure associated with a Java class.
CREATE ROLE — Defines a role and the permissions associated with that role.
CREATE STREAM — Creates an output stream in the database.
CREATE TABLE — Creates a table in the database.
CREATE TASK — Schedules a procedure to run periodically.
CREATE VIEW — Creates a view into one or more tables, optimizing access to a summary of their contents.
DR TABLE — Identifies a table as a participant in database replication (DR)
DROP FUNCTION — Removes the definition of a SQL function.
DROP INDEX — Removes an index.
DROP PROCEDURE — Removes the definition of a stored procedure.
DROP ROLE — Removes a role.
DROP STREAM — Removes a stream and, optionally, any views associated with it.
DROP TABLE — Removes a table and any data associated with it.
DROP TASK — Removes a task and cancels any future execution.
DROP VIEW — Removes a view and any data associated with it.
PARTITION PROCEDURE — Specifies that a stored procedure is partitioned.
PARTITION TABLE — Specifies that a table is partitioned and which is the partitioning column.
B. Supported SQL Statements
DELETE — Deletes one or more records from the database.
INSERT — Creates new rows in the database, using the specified values for the columns.
MIGRATE — queues table rows for migration to an export target.
SELECT — Fetches the specified rows and columns from the database.
TRUNCATE TABLE — Deletes all records from the specified table.
UPDATE — Updates the values within the specified columns and rows of the database.
UPSERT — Either inserts new rows or updates existing rows depending on the primary key value.
C. SQL Functions
ABS() — Returns the absolute value of a numeric expression.
APPROX_COUNT_DISTINCT() — Returns an approximate count of the number of distinct values for the specified column expression.
AREA() — Returns the area of a polygon in square meters.
ARRAY_ELEMENT() — Returns the element at the specified location in a JSON array.
ARRAY_LENGTH() — Returns the number of elements in a JSON array.
ASTEXT() — Returns the Well Known Text (WKT) representation of a GEOGRAPHY or GEOGRAPHY_POINT value.
AVG() — Returns the average of a range of numeric column values.
BIN() — Returns the binary representation of a BIGINT value as a string.
BIT_SHIFT_LEFT() — Shifts the bits of a BIGINT value to the left a specified number of places.
BIT_SHIFT_RIGHT() — Shifts the bits of a BIGINT value to the right a specified number of places.
BITAND() — Returns the mask of bits set in both of two BIGINT values
BITNOT() — Returns the mask reversing every bit of a BIGINT value.
BITOR() — Returns the mask of bits set in either of two BIGINT values
BITXOR() — Returns the mask of bits set in one but not both of two BIGINT values
CAST() — Explicitly converts an expression to the specified datatype.
CEILING() — Returns the smallest integer value greater than or equal to a numeric expression.
CENTROID() — Returns the central point of a polygon.
CHAR() — Returns a string with a single UTF-8 character associated with the specified character code.
CHAR_LENGTH() — Returns the number of characters in a string.
COALESCE() — Returns the first non-null argument, or null.
CONCAT() — Concatenates two or more strings and returns the result.
CONTAINS() — Returns true or false depending if a point falls within the specified polygon.
COS() — Returns the cosine of an angle specified in radians.
COT() — Returns the cotangent of an angle specified in radians.
COUNT() — Returns the number of rows selected containing the specified column.
CSC() — Returns the cosecant of an angle specified in radians.
CURRENT_TIMESTAMP() — Returns the current time as a timestamp value.
DATEADD() — Returns a new timestamp value by adding a specified time interval to an existing timestamp value.
DAY(), DAYOFMONTH() — Returns the day of the month as an integer value.
DAYOFWEEK() — Returns the day of the week as an integer between 1 and 7.
DAYOFYEAR() — Returns the day of the year as an integer between 1 and 366.
DECODE() — Evaluates an expression against one or more alternatives and returns the matching response.
DEGREES() — Converts an angle in radians to degrees
DISTANCE() — Returns the distance between two points or a point and a polygon.
DWITHIN() — Returns true or false depending whether two geospatial entities are within a specified distance of each other.
EXP() — Returns the exponential of the specified numeric expression.
EXTRACT() — Returns the value of a selected portion of a timestamp.
FIELD() — Extracts a field value from a JSON-encoded string column.
FLOOR() — Returns the largest integer value less than or equal to a numeric expression.
FORMAT_CURRENCY() — Converts a DECIMAL to a text string as a monetary value.
FORMAT_TIMESTAMP() — Takes a timestamp as input and returns a formatted string in the specified timezone.
FROM_UNIXTIME() — Converts a UNIX time value to a VoltDB timestamp.
HEX() — Returns the hexadecimal representation of a BIGINT value as a string.
HOUR() — Returns the hour of the day as an integer value.
INET6_ATON() — Converts an IPv6 internet address from a string to a VARBINARY(16) value
INET6_NTOA() — Converts an IPv6 internet address from a VARBINARY(16) value to a string
INET_ATON() — Converts an IPv4 internet address from a string to a numeric value
INET_NTOA() — Converts an IPv4 internet address from a numeric value to a string
ISINVALIDREASON() — Explains why a GEOGRAPHY polygon is invalid
ISVALID() — Determines if the specified GEOGRAPHY value is a valid polygon.
IS_VALID_TIMESTAMP() — Identifies whether a given value is a valid timestamp.
LATITUDE() — Returns the latitude of a GEOGRAPHY_POINT value.
LEFT() — Returns a substring from the beginning of a string.
LN(), LOG() — Returns the natural logarithm of a numeric value.
LOG10() — Returns the base-10 logarithm of a numeric value.
LONGITUDE() — Returns the longitude of a GEOGRAPHY_POINT value.
LOWER() — Returns a string converted to all lowercase characters.
MAKEVALIDPOLYGON() — Attempts to return a valid GEOGRAPHY value from a GEOGRAPHY polygon
MAX() — Returns the maximum value from a range of column values.
MAX_VALID_TIMESTAMP() — Returns the maximum valid timestamp.
MIGRATING() — Identifies table rows currently migrating to an export target.
MIN() — Returns the minimum value from a range of column values.
MIN_VALID_TIMESTAMP() — Returns the minimum valid timestamp.
MINUTE() — Returns the minute of the hour as an integer value.
MOD() — Returns the result of a modulo operation.
MONTH() — Returns the month of the year as an integer value.
NOW() — Returns the current time as a timestamp value.
NUMINTERIORRINGS() — Returns the number of interior rings within a polygon GEOGRAPHY value.
NUMPOINTS() — Returns the number of points within a polygon GEOGRAPHY value.
OCTET_LENGTH() — Returns the number of bytes in a string.
OVERLAY() — Returns a string overwriting a portion of the original string with the specified replacement.
PI() — Returns the value of the mathematical constant pi (π) as a FLOAT value.
POINTFROMTEXT() — Returns a GEOGRAPHY_POINT value from the corresponding WKT
POLYGONFROMTEXT() — Returns a GEOGRAPHY value from the corresponding WKT
POSITION() — Returns the starting position of a substring in another string.
POWER() — Returns the value of the first argument raised to the power of the second argument.
QUARTER() — Returns the quarter of the year as an integer value
RADIANS() — Converts an angle in degrees to radians
REGEXP_POSITION() — Returns the starting position of a regular expression within a text string.
REPEAT() — Returns a string composed of a substring repeated the specified number of times.
REPLACE() — Returns a string replacing the specified substring of the original string with new text.
RIGHT() — Returns a substring from the end of a string.
ROUND() — Returns a numeric value rounded to the specified decimal place
SEC() — Returns the secant of an angle specified in radians.
SECOND() — Returns the seconds of the minute as a floating point value.
SET_FIELD() — Returns a copy of a JSON-encoded string, replacing the specified field value.
SIN() — Returns the sine of an angle specified in radians.
SINCE_EPOCH() — Converts a VoltDB timestamp to an integer number of time units since the POSIX epoch.
SPACE() — Returns a string of spaces of the specified length.
SQRT() — Returns the square root of a numeric expression.
STR() — Returns the string representation of a numeric value.
SUBSTRING() — Returns the specified portion of a string expression.
SUM() — Returns the sum of a range of numeric column values.
TAN() — Returns the tangent of an angle specified in radians.
TO_TIMESTAMP() — Converts an integer value to a VoltDB timestamp based on the time unit specified.
TRIM() — Returns a string with leading and/or training spaces removed.
TRUNCATE() — Truncates a VoltDB timestamp to the specified time unit.
UPPER() — Returns a string converted to all uppercase characters.
VALIDPOLYGONFROMTEXT() — Returns a validated GEOGRAPHY value from the corresponding WKT
WEEK(), WEEKOFYEAR() — Returns the week of the year as an integer value.
WEEKDAY() — Returns the day of the week as an integer between 0 and 6.
YEAR() — Returns the year as an integer value.
D. VoltDB CLI Commands
csvloader — Imports the contents of a CSV file and inserts it into a VoltDB table.
jdbcloader — Extracts a table from another database via JDBC and inserts it into a VoltDB table.
kafkaloader — Imports data from a Kafka message queue into the specified database table.
sqlcmd — Starts an interactive command prompt for issuing SQL queries to a running VoltDB database
voltadmin — Performs administrative functions on a VoltDB database.
voltdb — Performs management tasks on the current server, such as starting and recovering the database.
voltsql — An interactive command interface for issuing SQL queries to a running VoltDB database
E. Configuration File (deployment.xml)
E.1. Understanding XML Syntax
E.2. The Structure of the Configuration File
F. VoltDB Datatype Compatibility
F.1. Java and VoltDB Datatype Compatibility
G. System Procedures
@AdHoc — Executes an SQL statement specified at runtime.
@Explain — Returns the execution plan for the specified SQL query.
@ExplainProc — Returns the execution plans for all SQL queries in the specified stored procedure.
@ExplainView — Returns the execution plans for the components of the specified view.
@GetPartitionKeys — Returns a list of partition values, one for every partition in the database.
@Pause — Initiates read-only mode on the cluster.
@Ping — Indicates whether the database is currently running.
@Promote — Promotes a replica database to normal operation.
@QueryStats — Queries statistics like a SQL table.
@Quiesce — Waits for all queued export and DR data to be processed or saved to disk
@Resume — Returns a paused database to normal operating mode.
@Shutdown — Shuts down the database.
@SnapshotDelete — Deletes one or more native snapshots.
@SnapshotRestore — Restores a database from disk using a native format snapshot.
@SnapshotSave — Saves the current database contents to disk.
@SnapshotScan — Lists information about existing native snapshots in a given directory path.
@Statistics — Returns statistics about the usage of the VoltDB database.
@StopNode — Stops a VoltDB server process, removing the node from the cluster.
@SwapTables — Swaps the contents of one table for another
@SystemCatalog — Returns metadata about the database schema.
@SystemInformation — Returns configuration information about VoltDB and the individual nodes of the database cluster.
@UpdateApplicationCatalog — Reconfigures the database by replacing the configuration file.
@UpdateClasses — Adds and removes Java classes from the database.
@UpdateLogging — Changes the logging configuration for a running database.