Using VoltDB

Documentation

VoltDB Home » Documentation » Using VoltDB

< previous

table of contents

Using VoltDB

V13

Copyright © 2008-2023 Volt Active Data, Inc.

This document is published under copyright by Volt Active Data, Inc. All Rights Reserved.

The software described in this document is furnished under a license by Volt Active Data, Inc. Your rights to access and use VoltDB features are defined by the license you received when you acquired the software.

The VoltDB client libraries, for accessing VoltDB databases programmatically, are licensed separately under the MIT license.

Volt Active Data, VoltDB, and Active(N) are registered trademarks of Volt Active Data, Inc.

VoltDB software is protected by U.S. Patent Nos. 9,600,514, 9,639,571, 10,067,999, 10,176,240, and 10,268,707. Other patents pending.

Abstract

This book explains how to use VoltDB to design, build, and run high performance applications.

Table of Contents

About This Book

1. Overview

1.1. What is VoltDB?

1.2. Who Should Use VoltDB

1.3. How VoltDB Works

1.3.1. Partitioning
1.3.2. Serialized (Single-Threaded) Processing
1.3.3. Partitioned vs. Replicated Tables
1.3.4. Ease of Scaling to Meet Application Needs

1.4. Working with VoltDB Effectively

2. Installing VoltDB

2.1. Operating System and Software Requirements

2.2. Installing VoltDB

2.2.1. Upgrading From Older Versions

2.3. Setting Up Your Environment

2.4. What is Included in the VoltDB Distribution

3. Starting the Database

3.1. Initializing and Starting a VoltDB Database

3.2. Initializing and Starting a VoltDB Database on a Cluster

3.3. Stopping a VoltDB Database

3.4. Saving the Data

3.5. Restarting a VoltDB Database

3.6. Updating Nodes on the Cluster

3.7. Defining the Cluster Configuration

3.7.1. Determining How Many Sites per Host
3.7.2. Configuring Paths for Runtime Features
3.7.3. Verifying your Hardware Configuration

4. Designing the Database Schema

4.1. How to Enter DDL Statements

4.2. Creating Tables and Primary Keys

4.3. Analyzing Data Volume and Workload

4.4. Partitioning Database Tables

4.4.1. Choosing a Column on which to Partition Table Rows
4.4.2. Specifying Partitioned Tables
4.4.3. Design Rules for Partitioning Tables

4.5. Replicating Database Tables

4.5.1. Choosing Replicated Tables
4.5.2. Specifying Replicated Tables

4.6. Modifying the Schema

4.6.1. Effects of Schema Changes on Data and Clients
4.6.2. Viewing the Schema
4.6.3. Modifying Tables
4.6.4. Adding and Dropping Indexes
4.6.5. Modifying Partitioning for Tables and Stored Procedures

5. Designing Stored Procedures to Access the Database

5.1. How Stored Procedures Work

5.1.1. VoltDB Stored Procedures are Transactional
5.1.2. VoltDB Stored Procedures are Deterministic

5.2. The Anatomy of a VoltDB Stored Procedure

5.2.1. The Structure of the Stored Procedure
5.2.2. Passing Arguments to a Stored Procedure
5.2.3. Creating and Executing SQL Queries in Stored Procedures
5.2.4. Interpreting the Results of SQL Queries
5.2.5. Returning Results from a Stored Procedure
5.2.6. Rolling Back a Transaction

5.3. Installing Stored Procedures into the Database

5.3.1. Compiling, Packaging, and Loading Stored Procedures
5.3.2. Declaring Stored Procedures in the Schema
5.3.3. Partitioning Stored Procedures in the Schema

6. Designing VoltDB Client Applications

6.1. Connecting to the VoltDB Database

6.1.1. Connecting to Multiple Servers
6.1.2. Using the Auto-Connecting Client

6.2. Invoking Stored Procedures

6.3. Invoking Stored Procedures Asynchronously

6.4. Closing the Connection

6.5. Handling Errors

6.5.1. Interpreting Execution Errors
6.5.2. Handling Timeouts
6.5.3. Writing a Status Listener to Interpret Other Errors

6.6. Compiling and Running Client Applications

6.6.1. Starting the Client Application
6.6.2. Running Clients from Outside the Cluster

7. Simplifying Application Development

7.1. Using Default Procedures
7.2. Shortcut for Defining Simple Stored Procedures
7.3. Verifying Expected Query Results
7.4. Scheduling Stored Procedures as Tasks
7.5. Directed Procedures: Distributing Transactions to Every Partition

8. Using VoltDB with Other Programming Languages

8.1. C++ Client Interface

8.1.1. Writing VoltDB Client Applications in C++
8.1.2. Creating a Connection to the Database Cluster
8.1.3. Invoking Stored Procedures
8.1.4. Invoking Stored Procedures Asynchronously
8.1.5. Interpreting the Results

8.2. JSON HTTP Interface

8.2.1. How the JSON Interface Works
8.2.2. Using the JSON Interface from Client Applications
8.2.3. How Parameters Are Interpreted
8.2.4. Interpreting the JSON Results
8.2.5. Error Handling using the JSON Interface

8.3. JDBC Interface

8.3.1. Using JDBC to Connect to a VoltDB Database
8.3.2. Using JDBC to Query a VoltDB Database

9. Using VoltDB in a Cluster

9.1. Starting a Database Cluster

9.2. Updating the Cluster Configuration

9.3. Elastic Scaling to Resize the Cluster

9.3.1. Adding Nodes with Elastic Scaling
9.3.2. Removing Nodes with Elastic Scaling
9.3.3. Configuring How VoltDB Rebalances Nodes During Elastic Scaling

10. Availability

10.1. How K-Safety Works

10.2. Enabling K-Safety

10.2.1. What Happens When You Enable K-Safety
10.2.2. Calculating the Appropriate Number of Nodes for K-Safety

10.3. Recovering from System Failures

10.3.1. What Happens When a Node Rejoins the Cluster
10.3.2. Where and When Recovery May Fail

10.4. Avoiding Network Partitions

10.4.1. K-Safety and Network Partitions
10.4.2. Using Network Fault Protection

11. Database Replication and Active(N)

11.1. How Database Replication Works

11.1.1. Starting Database Replication
11.1.2. Database Replication, Availability, and Disaster Recovery
11.1.3. Database Replication and Completeness

11.2. Using Passive Database Replication

11.2.1. Specifying the DR Tables in the Schema
11.2.2. Configuring the Clusters
11.2.3. Starting the Clusters
11.2.4. Loading the Schema and Starting Replication
11.2.5. Stopping Replication
11.2.6. Database Replication and Read-only Clients

11.3. Using Active(N) Cross Datacenter Replication

11.3.1. Designing Your Schema for Active Replication
11.3.2. Configuring the Database Clusters
11.3.3. Starting the Database Clusters
11.3.4. Loading a Matching Schema and Starting Replication
11.3.5. Stopping Replication
11.3.6. Example XDCR Configurations
11.3.7. Understanding Conflict Resolution

11.4. Updating the Schema During Replication

11.4.1. Safely Updating the Schema While the Clusters are Paused
11.4.2. Adding and Removing Tables Without Pausing
11.4.3. Using Dynamic Schema Change to Add, Remove, and Modify Table Columns Without Pausing

11.5. Monitoring Database Replication

12. Security

12.1. How Security Works in VoltDB

12.2. Enabling Authentication and Authorization

12.3. Defining Users and Roles

12.4. Assigning Access to Stored Procedures

12.5. Assigning Access by Function (System Procedures, SQL Queries, and Default Procedures)

12.6. Using Built-in Roles

12.7. Encrypting VoltDB Communication Using TLS/SSL

12.7.1. Configuring TLS/SSL on the VoltDB Server
12.7.2. Choosing What Ports to Encrypt with TLS/SSL
12.7.3. Using the VoltDB Command Line Utilities with TLS/SSL
12.7.4. Implementing TLS/SSL in the Java Client Applications
12.7.5. Configuring Database Replication (DR) With TLS/SSL

12.8. Integrating LDAP Security with VoltDB

12.8.1. Configuring LDAP Security in VoltDB
12.8.2. Configuring VoltDB Security in LDAP

12.9. Integrating Kerberos Security with VoltDB

12.9.1. Installing and Configuring Kerberos
12.9.2. Installing and Configuring the Java Security Extensions
12.9.3. Configuring the VoltDB Servers and Clients
12.9.4. Accessing the Database from the Command Line and the Web

13. Saving & Restoring a VoltDB Database

13.1. Performing a Manual Save and Restore of a VoltDB Cluster

13.1.1. How to Save the Contents of a VoltDB Database
13.1.2. How to Restore the Contents of a VoltDB Database Manually
13.1.3. Changing the Cluster Configuration Using Save and Restore

13.2. Scheduling Automated Snapshots

13.3. Managing Snapshots

13.4. Special Notes Concerning Save and Restore

14. Command Logging and Recovery

14.1. How Command Logging Works

14.2. Controlling Command Logging

14.3. Configuring Command Logging for Optimal Performance

14.3.1. Log Size
14.3.2. Log Frequency
14.3.3. Synchronous vs. Asynchronous Logging
14.3.4. Hardware Considerations

15. Streaming Data: Import, Export, and Migration

15.1. How Data Streaming Works in VoltDB

15.1.1. Understanding Import
15.1.2. Understanding Export
15.1.3. Understanding Migration
15.1.4. Understanding Topics

15.2. The Business Case for Streaming Data

15.2.1. Extract, Transform, Load (ETL)
15.2.2. Change Data Capture
15.2.3. Streaming Data Validation
15.2.4. Caching
15.2.5. Archiving

15.3. VoltDB Export Connectors

15.3.1. How Export Works
15.3.2. The File Export Connector
15.3.3. The HTTP Export Connector
15.3.4. The JDBC Export Connector
15.3.5. The Kafka Export Connector
15.3.6. The Elasticsearch Export Connector

15.4. VoltDB Import Connectors

15.4.1. Bulk Loading Data Using VoltDB Standalone Utilities
15.4.2. Streaming Import Using Built-in Import Features
15.4.3. The Kafka Importer
15.4.4. The Kinesis Importer

15.5. VoltDB Import Formatters

15.6. VoltDB Topics

15.6.1. Types of VoltDB Topics
15.6.2. Declaring VoltDB Topics
15.6.3. Configuring and Managing Topics
15.6.4. Configuring the Topic Server
15.6.5. Calling Topics from Consumers and Producers
15.6.6. Using Opaque Topics

A. Supported SQL DDL Statements

ALTER STREAM — Modifies an existing stream definition.
ALTER TABLE — Modifies an existing table definition.
ALTER TASK — Modifies an existing task schedule.
ALTER VIEW — modifies the TTL settings of an existing stream view
CREATE AGGREGATE FUNCTION — Defines an aggregate SQL function and associates it with a Java class.
CREATE FUNCTION — Defines a SQL scalar function and associates it with a Java method.
CREATE INDEX — Creates an index for faster access to a table.
CREATE PROCEDURE AS — Defines a stored procedure composed of one or more SQL statements.
CREATE PROCEDURE FROM CLASS — Defines a stored procedure associated with a Java class.
CREATE ROLE — Defines a role and the permissions associated with that role.
CREATE STREAM — Creates an output stream in the database.
CREATE TABLE — Creates a table in the database.
CREATE TASK — Schedules a procedure to run periodically.
CREATE VIEW — Creates a view into one or more tables, optimizing access to a summary of their contents.
DR TABLE — Identifies a table as a participant in database replication (DR)
DROP FUNCTION — Removes the definition of a SQL function.
DROP INDEX — Removes an index.
DROP PROCEDURE — Removes the definition of a stored procedure.
DROP ROLE — Removes a role.
DROP STREAM — Removes a stream and, optionally, any views associated with it.
DROP TABLE — Removes a table and any data associated with it.
DROP TASK — Removes a task and cancels any future execution.
DROP VIEW — Removes a view and any data associated with it.
PARTITION TABLE — Specifies that a table is partitioned and which is the partitioning column.

B. Supported SQL Statements

DELETE — Deletes one or more records from the database.
INSERT — Creates new rows in the database, using the specified values for the columns.
MIGRATE — queues table rows for migration to an export target.
SELECT — Fetches the specified rows and columns from the database.
TRUNCATE TABLE — Deletes all records from the specified table.
UPDATE — Updates the values within the specified columns and rows of the database.
UPSERT — Either inserts new rows or updates existing rows depending on the primary key value.

C. SQL Functions

ABS() — Returns the absolute value of a numeric expression.
APPROX_COUNT_DISTINCT() — Returns an approximate count of the number of distinct values for the specified column expression.
AREA() — Returns the area of a polygon in square meters.
ARRAY_ELEMENT() — Returns the element at the specified location in a JSON array.
ARRAY_LENGTH() — Returns the number of elements in a JSON array.
ASTEXT() — Returns the Well Known Text (WKT) representation of a GEOGRAPHY or GEOGRAPHY_POINT value.
AVG() — Returns the average of a range of numeric column values.
BIN() — Returns the binary representation of a BIGINT value as a string.
BIT_SHIFT_LEFT() — Shifts the bits of a BIGINT value to the left a specified number of places.
BIT_SHIFT_RIGHT() — Shifts the bits of a BIGINT value to the right a specified number of places.
BITAND() — Returns the mask of bits set in both of two BIGINT values
BITNOT() — Returns the mask reversing every bit of a BIGINT value.
BITOR() — Returns the mask of bits set in either of two BIGINT values
BITXOR() — Returns the mask of bits set in one but not both of two BIGINT values
CAST() — Explicitly converts an expression to the specified datatype.
CEILING() — Returns the smallest integer value greater than or equal to a numeric expression.
CENTROID() — Returns the central point of a polygon.
CHAR() — Returns a string with a single UTF-8 character associated with the specified character code.
CHAR_LENGTH() — Returns the number of characters in a string.
COALESCE() — Returns the first non-null argument, or null.
CONCAT() — Concatenates two or more strings and returns the result.
CONTAINS() — Returns true or false depending if a point falls within the specified polygon.
COS() — Returns the cosine of an angle specified in radians.
COT() — Returns the cotangent of an angle specified in radians.
COUNT() — Returns the number of rows selected containing the specified column.
CSC() — Returns the cosecant of an angle specified in radians.
CURRENT_TIMESTAMP() — Returns the current time as a timestamp value.
DATEADD() — Returns a new timestamp value by adding a specified time interval to an existing timestamp value.
DATEDIFF() — Returns the difference between two timestamps as an integer.
DAY(), DAYOFMONTH() — Returns the day of the month as an integer value.
DAYOFWEEK() — Returns the day of the week as an integer between 1 and 7.
DAYOFYEAR() — Returns the day of the year as an integer between 1 and 366.
DECODE() — Evaluates an expression against one or more alternatives and returns the matching response.
DEGREES() — Converts an angle in radians to degrees
DISTANCE() — Returns the distance between two points or a point and a polygon.
DWITHIN() — Returns true or false depending whether two geospatial entities are within a specified distance of each other.
EXP() — Returns the exponential of the specified numeric expression.
EXTRACT() — Returns the value of a selected portion of a timestamp.
FIELD() — Extracts a field value from a JSON-encoded string column.
FLOOR() — Returns the largest integer value less than or equal to a numeric expression.
FORMAT_CURRENCY() — Converts a DECIMAL to a text string as a monetary value.
FORMAT_TIMESTAMP() — Takes a timestamp as input and returns a formatted string in the specified timezone.
FROM_UNIXTIME() — Converts a UNIX time value to a VoltDB timestamp.
HEX() — Returns the hexadecimal representation of a BIGINT value as a string.
HOUR() — Returns the hour of the day as an integer value.
INET6_ATON() — Converts an IPv6 internet address from a string to a VARBINARY(16) value
INET6_NTOA() — Converts an IPv6 internet address from a VARBINARY(16) value to a string
INET_ATON() — Converts an IPv4 internet address from a string to a numeric value
INET_NTOA() — Converts an IPv4 internet address from a numeric value to a string
ISINVALIDREASON() — Explains why a GEOGRAPHY polygon is invalid
ISVALID() — Determines if the specified GEOGRAPHY value is a valid polygon.
IS_VALID_TIMESTAMP() — Identifies whether a given value is a valid timestamp.
LATITUDE() — Returns the latitude of a GEOGRAPHY_POINT value.
LEFT() — Returns a substring from the beginning of a string.
LN(), LOG() — Returns the natural logarithm of a numeric value.
LOG10() — Returns the base-10 logarithm of a numeric value.
LONGITUDE() — Returns the longitude of a GEOGRAPHY_POINT value.
LOWER() — Returns a string converted to all lowercase characters.
MAKEVALIDPOLYGON() — Attempts to return a valid GEOGRAPHY value from a GEOGRAPHY polygon
MAX() — Returns the maximum value from a range of column values.
MAX_VALID_TIMESTAMP() — Returns the maximum valid timestamp.
MIGRATING() — Identifies table rows currently migrating to an export target.
MIN() — Returns the minimum value from a range of column values.
MIN_VALID_TIMESTAMP() — Returns the minimum valid timestamp.
MINUTE() — Returns the minute of the hour as an integer value.
MOD() — Returns the result of a modulo operation.
MONTH() — Returns the month of the year as an integer value.
NOW() — Returns the current time as a timestamp value.
NUMINTERIORRINGS() — Returns the number of interior rings within a polygon GEOGRAPHY value.
NUMPOINTS() — Returns the number of points within a polygon GEOGRAPHY value.
OCTET_LENGTH() — Returns the number of bytes in a string.
OVERLAY() — Returns a string overwriting a portion of the original string with the specified replacement.
PI() — Returns the value of the mathematical constant pi (π) as a FLOAT value.
POINTFROMTEXT() — Returns a GEOGRAPHY_POINT value from the corresponding WKT
POLYGONFROMTEXT() — Returns a GEOGRAPHY value from the corresponding WKT
POSITION() — Returns the starting position of a substring in another string.
POWER() — Returns the value of the first argument raised to the power of the second argument.
QUARTER() — Returns the quarter of the year as an integer value
RADIANS() — Converts an angle in degrees to radians
REGEXP_POSITION() — Returns the starting position of a regular expression within a text string.
REPEAT() — Returns a string composed of a substring repeated the specified number of times.
REPLACE() — Returns a string replacing the specified substring of the original string with new text.
RIGHT() — Returns a substring from the end of a string.
ROUND() — Returns a numeric value rounded to the specified decimal place
SEC() — Returns the secant of an angle specified in radians.
SECOND() — Returns the seconds of the minute as a floating point value.
SET_FIELD() — Returns a copy of a JSON-encoded string, replacing the specified field value.
SIGN() — Indicates whether a numeric value is positive, negative, or zero.
SIN() — Returns the sine of an angle specified in radians.
SINCE_EPOCH() — Converts a VoltDB timestamp to an integer number of time units since the POSIX epoch.
SPACE() — Returns a string of spaces of the specified length.
SQRT() — Returns the square root of a numeric expression.
STR() — Returns the string representation of a numeric value.
SUBSTRING() — Returns the specified portion of a string expression.
SUM() — Returns the sum of a range of numeric column values.
TAN() — Returns the tangent of an angle specified in radians.
TIME_WINDOW() — Returns the time "window" into which the specified timestamp value falls.
TO_TIMESTAMP() — Converts an integer value to a VoltDB timestamp based on the time unit specified.
TRIM() — Returns a string with leading and/or training spaces removed.
TRUNCATE() — Truncates a VoltDB timestamp to the specified time unit.
UPPER() — Returns a string converted to all uppercase characters.
VALIDPOLYGONFROMTEXT() — Returns a validated GEOGRAPHY value from the corresponding WKT
WEEK(), WEEKOFYEAR() — Returns the week of the year as an integer value.
WEEKDAY() — Returns the day of the week as an integer between 0 and 6.
YEAR() — Returns the year as an integer value.

D. VoltDB CLI Commands

csvloader — Imports the contents of a CSV file and inserts it into a VoltDB table.
jdbcloader — Extracts a table from another database via JDBC and inserts it into a VoltDB table.
kafkaloader — Imports data from a Kafka message queue into the specified database table.
sqlcmd — Starts an interactive command prompt for issuing SQL queries to a running VoltDB database
voltadmin — Performs administrative functions on a VoltDB database.
voltdb — Performs management tasks on the current server, such as starting and recovering the database.

E. Configuration File (deployment.xml)

E.1. Understanding XML Syntax
E.2. The Structure of the Configuration File

F. VoltDB Datatype Compatibility

F.1. Java and VoltDB Datatype Compatibility

G. System Procedures

@AdHoc — Executes an SQL statement specified at runtime.
@Explain — Returns the execution plan for the specified SQL query.
@ExplainProc — Returns the execution plans for all SQL queries in the specified stored procedure.
@ExplainView — Returns the execution plans for the components of the specified view.
@GetPartitionKeys — Returns a list of partition values, one for every partition in the database.
@Metrics — Returns information about the performance and current status of the database server
@Note — Writes a message into the VoltDB log file.
@Pause — Initiates read-only mode on the cluster.
@Ping — Indicates whether the database is currently running.
@Promote — Promotes a replica database to normal operation.
@QueryStats — Queries statistics like a SQL table.
@Quiesce — Waits for all queued export and DR data to be processed or saved to disk
@Resume — Returns a paused database to normal operating mode.
@Shutdown — Shuts down the database.
@SnapshotDelete — Deletes one or more native snapshots.
@SnapshotRestore — Restores a database from disk using a native format snapshot.
@SnapshotSave — Saves the current database contents to disk.
@SnapshotScan — Lists information about existing native snapshots in a given directory path.
@Statistics — Returns statistics about the usage of the VoltDB database.
@StopNode — Stops a VoltDB server process, removing the node from the cluster.
@SwapTables — Swaps the contents of one table for another
@SystemCatalog — Returns metadata about the database schema.
@SystemInformation — Returns configuration information about VoltDB and the individual nodes of the database cluster.
@UpdateApplicationCatalog — Reconfigures the database by replacing the configuration file.
@UpdateClasses — Adds and removes Java classes from the database.
@UpdateLogging — Changes the logging configuration for a running database.

List of Figures

1.1. Partitioning Tables
1.2. Serialized Processing
1.3. Replicating Tables
4.1. Components of a Database Schema
4.2. Partitions Distribute Table Data and Stored Procedure Processing
4.3. Diagram Representing the Flight Reservation System
5.1. Array of VoltTable Structures
5.2. One VoltTable Structure is returned for each Queued SQL Statement
5.3. Stored Procedures Execute in the Appropriate Partition Based on the Partitioned Parameter Value
8.1. The Structure of the VoltDB JSON Response
10.1. K-Safety in Action
10.2. Network Partition
10.3. Network Fault Protection in Action
11.1. Passive Database Replication
11.2. Cross Datacenter Replication
11.3. Replicating an Existing Database
11.4. Promoting the Replica
11.5. Read-Only Access to the Replica
11.6. Standard XDCR Configuration
11.7. XDCR Configuration with Read-Only Replicas
11.8. Transaction Order and Conflict Resolution
14.1. Command Logging in Action
14.2. Recovery in Action
15.1. Overview of Data Streaming
15.2. Overview of Topics
E.1. Configuration XML Structure

List of Tables

2.1. Operating System and Software Requirements
2.2. Components Installed by VoltDB
4.1. Example Application Workload
5.1. Methods of the VoltTable Classes
8.1. Datatypes in the JSON Interface
11.1. Structure of the XDCR Conflict Logs
12.1. Named Security Permissions
15.1. File Export Properties
15.2. Export Metadata
15.3. HTTP Export Properties
15.4. JDBC Export Properties
15.5. Kafka Export Properties
15.6. Elasticsearch Export Properties
15.7. Kafka Import Properties
15.8. Kinesis Import Properties
15.9. CSV and TSV Formatter Properties
15.10. Topic Formatting Properties
A.1. Supported SQL Datatypes
C.1. Selectable Values for the EXTRACT Function
E.1. Configuration File Elements and Attributes
F.1. Java and VoltDB Datatype Compatibility
G.1. @SnapshotRestoreOptions
G.2. @SnapshotSave Options

List of Examples

4.1. DDL Example of a Reservation Schema
5.1. Components of a VoltDB Java Stored Procedure
5.2. Cycles of Queue and Execute in a Stored Procedure
5.3. Displaying the Contents of VoltTable Arrays

< previous

table of contents