voltdb

Documentation

VoltDB Home » Documentation » Using VoltDB

voltdb

voltdb — Performs management tasks on the current server, such as starting and recovering the database.

Synopsis

voltdb collect [args]

voltdb get classes [args]

voltdb get deployment [args]

voltdb get schema [args]

voltdb mask [args] source-configuration-file [new-configuration-file]

voltdb init [args]

voltdb start [args]

Description

The voltdb command performs local management functions on the current system, including:

  • Initializing the database root directory and setting configuration options

  • Starting the database process

  • Collecting log files into a single compressed file

  • Retrieving the classes, deployment, or schema from a database root directory

  • Hiding passwords in the configuration file

The action that is performed depends on which start action you specify to the voltdb command:

  • collect — the collect option collects system and process logs related to the VoltDB database process on the current system and compresses them into a single file. This command is helpful when reporting problems to VoltDB support.

  • get — the get option retrieves the current configuration, procedure classes, or schema from the database root directory. The requested item is then written to a file. This command can be used whether the database is running or not. You can use options to specify either or both the parent of the root directory (--dir) or the name and location of the output file (--output). Note that the get option can only be used on databases created using init and start.

  • mask — the mask option disguises the passwords associated with user accounts in the security section of the configuration file. The output of the voltdb mask command is either a new configuration file with hashed passwords or, if you do not specify an output file, the original input file is modified in place.

  • init — the init option initializes the root directory VoltDB uses for storing the configuration, logs, and other disk-based information (such as snapshots and command logs) for the database process. You only need to initialize the root directory once. After that, VoltDB manages the content and selecting the appropriate start actions to maintain the database state. If you choose to re-initialize an existing root directory, you can use the --force argument to delete any previous data.[]

  • start — the starts option starts the database process after the root directory has been initialized. The actual action that VoltDB takes depends on the current state of the database cluster:

    • If this is the first time the database has started, it creates a new database.

    • If the database has run before and is configured to use command logs or there is at least one snapshot in the snapshots directory, the database is restarted and previous data recovered.

    • If the cluster is already running and a server is missing (assuming the use of K-safety) the current node will rejoin the running cluster.

    • If the cluster is already running with all servers present, the current node will be added to expand the size of the cluster — as long as you use the --add argument on the start command.

The voltdb start command uses Java to instantiate the process. It is possible to customize the Java environment, if necessary, by passing command line arguments to Java through the following environment variables:

  • LOG4J_CONFIG_PATH — Specifies an alternate Log4J configuration file.

  • VOLTDB_GC_OPTS — Lets you specify which Java garbage collector to use and other GC-related options. Specify the options using standard Java -XX format. For example:

    export VOLTDB_GC_OPTS="-XX+useG1GC -XX+UseStringDeduplication"
  • VOLTDB_HEAPMAX — Specifies the maximum heap size for the Java process. Specify the value as an integer number of megabytes. By default, the maximum heap size is set to 2048.

  • VOLTDB_OPTS — Specifies all other Java command line arguments. You must include both the command line flag and argument. For example, this environment variable can be used to specify system properties using the -D flag:

    export VOLTDB_OPTS="-DmyApp.DebugFlag=true"

Log Collection (voltdb collect) Arguments

The following arguments apply specifically to the collect action.

-D --dir={directory}

Specifies the parent location for the database root directory from which to collect information. The default, if you do not specify a directory, is the current working directory.

--days={integer}

Specifies the number of days of log files to collect. For example, using --days=1 will collect data from the last 24 hours. By default, VoltDB collects 14 days (2 weeks) worth of logs.

--dry-run

Lists the actions that will be taken, including the files that will be collected, but does not actually perform the collection or upload.

--no-prompt

Specifies that the process will not prompt for input, such as whether to delete the output file after uploading is complete. This argument is useful when starting the collect action from within a script.

--output={file}

Specifies the name and location of the resulting output file. The default output file name starts with "voltdb_collect_" and includes the current server IP or hostname, with a file extension of ".zip" saved to the current working directory.

--skip-heap-dump

Specifies that the heap dump not be included in the collection. The heap dump is usually significantly larger than the other log files and can be excluded to save space.

Get Resource (voltdb get) Arguments

The following arguments apply specifically to the get classes, get deployment, and get schema actions.

-D --dir={directory}

Specifies the parent location for the database root directory. The default, if you do not specify a directory, is the current working directory.

-f, --force

Allows the command to overwrite an existing file. By default, the get actions will not overwrite existing files.

-o --output={file-path}

Specifies the name and, optionally, location for the resulting output file. The default location is the current working directory. The default file depends on the resource being requested:

  • procedures.jar for get classes

  • deployment.yaml for get deployment

  • schema.sql for get schema

In addition, the following arguments are specific to get deployment and are mutually exclusive:

--xml

Specifies that the output be in XML format.

--yaml

Specifies that the output be in YAML format.

If neither the --xml or --yaml argument is specified, the default output format is YAML. However, if the format is not specified but the output file is and the specified file has ".xml" as its file type, the output is generated as XML.

Initialization (voltdb init) Arguments

The following arguments apply to the voltdb init command.

-C, --config={configuration-file}[,...]

Specifies the location of one or more database configuration files. The configuration files are YAML files that define the logical structure of the database , including which options to enable when the database starts. See Appendix E, YAML Configuration Properties for a complete description of the syntax of the configuration file. Use of YAML properties is recommended. However, you can specify a single XML file as an alternative method for specifying the configuration.

If you do not specify a configuration file, default is a configuration that includes command logging (where available), no K-safety, and eight sites per host.

-D --dir={directory}

Specifies the parent location for the database root directory. The root directory is named voltdbroot and is created if it does not already exist in the specified location. If a voltdbroot directory does already exist, you must use the --force argument to override any existing data. The default, if you do not specify a directory, is the current working directory.

-f, --force

Initializes the database root directory, even if files (such as command logs or snapshots) already exist in the specified directory. Initializing the root directory after previously running a database could overwrite and therefore erase old command logs. Therefore, VoltDB will not, by default, initialize the database if such files exist. If you do not need the files from the previous session, you can use the --force argument to overwrite these files.

-j, --classes={JAR-file} [, ...]

Specifies the location of one or more JAR files containing classes used to declare user-defined stored procedures. The JAR files (and any schema definitions included with the --schema argument) are loaded automatically when the database starts. Separate multiple file names with commas. You can also use asterisk (*) as a wildcard character in the file specification. If durability is enabled (through command logs or a shutdown snapshot) the classes specified on the init command are loaded only the first time the database starts and the command logs are used for subsequent starts. If no durability is provided, the initialized classes are loaded on every start.

-l, --license={license-file}

Specifies the location of the license file, which is required. If no license is specified, Volt looks for a file named license.xml in the current working directory, the /voltdb subfolder where the VoltDB software is installed, or the current user's home directory.

-r, --retain={integer}

Specifies the maximum number of snapshot directories to save when performing a voltdb init --force. When initializing a root directory with --force, VoltDB deletes all previous files in the directory except the snapshot subfolder, which is renamed snapshots.1, snapshots.2, and so on. By default, VoltDB saves only two older snapshot folders. The --retain argument lets you specify a different maximum number of folders to save.

-s, --schema={schema-file} [, ...]

Specifies the location of one or more files containing database definition language (DDL) statements. The DDL statements (and any classes included with the --classes argument) are loaded automatically when the database starts. Separate multiple file names with commas. You can also use asterisk (*) as a wildcard character in the file specification. If durability is enabled (through command logs or a shutdown snapshot) the schema specified on the init command is loaded only the first time the database starts and the command logs are used for subsequent starts. If no durability is provided, the initialized schema is loaded on every start.

Database Startup (voltdb start) Arguments

The following arguments apply to the voltdb start command.

-D --dir={directory}

Specifies the parent location for the database root directory. This is the same directory specified on the voltdb init command. (You must initialize the root directory before you can start the database.) The default, if you do not specify a directory, is the current working directory.

-H, --host={host-id} [,...]

Specifies the network address of one or more nodes in the database cluster. VoltDB selects one of these nodes to coordinate the start of the database or the adding or rejoining of servers. When starting a database, all nodes must specify the same list of host addresses. Note that once the database starts and the cluster is complete, the role of the host node is complete and all nodes become peers.

When rejoining or adding a server to a running cluster, you can specify any node(s) still in the cluster. The host for an add or rejoin operation does not have to be the same node specified when the database started.

The default if you do not specify a host when creating or recovering the database is localhost. In other words, a single node cluster running on the current system. You must specify a host on the command line when adding or rejoining a node or when starting a cluster.

If the host node is using an internal port other than the default (3021), you must specify the port as part of the host string, in the format host:port.

When used in conjunction with the --missing flag, the first host in the list must be one of the current hosts, not one of the missing nodes.

-c, --count={number-of-nodes}

Specifies the number of nodes in the database cluster.

--add

When joining a running cluster, specifies that the new node can be "added", elastically expanding the size of the cluster. The --add flag only takes affect when a node is joining a complete, running cluster. If the cluster is starting or if a node is missing from a K-safe cluster, the current node will join the cluster as normal. But if the cluster is already running and has its full complement of members, you must specify --add if you want to increase the size of the cluster.

-B, --background

Starts the server process in the background (as a daemon process).

-g, --placement-group={group-name}

Specifies the location of the server. When the K-safety value is greater than zero, VoltDB uses this argument to assist in rack-aware partitioning. The cluster will attempt to place multiple copies of each partition on different nodes to keep them physically as far apart as possible. The physical location is specified by the group-name, which is an alphanumeric name. The names might represent physical servers, racks, switches, or anything meaningful to the user to avoid multiple copies failing at the same time.

To be effective, placement groups must adhere to the following rules:

  • There must be more than one placement group specified for the cluster.

  • The number of nodes must be a multiple of the number of placement groups.

  • The number of placement groups must be a multiple of K+1.

Otherwise, there are no guarantees the partitions will be evenly distributed.

--ignore=thp

For Linux systems, allows the database to start even if the server is configured to use Transparent Huge Pages (THP). THP is a known problem for memory-intense applications like VoltDB. So under normal conditions VoltDB will not start if the use of THP is enabled. This flag allows you to ignore that restriction for test purposes. Do not use this flag on production systems.

--missing={number-of-nodes}

Allows a K-safe cluster to start without the full complement of nodes. This argument specifies how many nodes are missing from the cluster at startup. For example, if the arguments are --count=5 and --missing=2, then the database will start once three nodes join the cluster, assuming those nodes can support at least one copy of each partition. Note that use of the --missing option means that the cluster is not fully K-safe until the specified number of missing nodes rejoin the cluster after the database starts. Also, the --hosts flag should list currently available hosts, not the missing nodes.

--pause

For the create and recover operations only, starts the database in admin mode. Admin mode stops applications from performing write operations to the database through the client interface. This is useful when performing administrative functions such as restoring a snapshot before allowing client access. Once all administrative operations are complete, you can use the voltadmin resume command to resume normal operation for the database. If any nodes in the cluster start with the --pause switch, the entire cluster starts paused.

--safemode

When using command logs to recover an existing database that cannot recover under normal circumstances, the --safemode argument recovers the database to the last valid transaction. This argument should only be used when troubleshooting a failed recovery. See the description of safe mode recovery in the VoltDB Administrator's Guide for details.

Network Configuration Arguments

In addition to the arguments listed above for the voltdb start command, there are additional arguments that specify the network configuration for server ports and interfaces when starting a VoltDB database. In most cases, the default values can and should be accepted for these settings. The exceptions are the external and internal interfaces that should be specified whenever there are multiple network interfaces on a single machine.

You can choose to set network ports and interfaces either using individual arguments for each port or as a YAML file identified by the --network qualifier. When specifying network settings in YAML, you identify the type of port followed by individual properties for the port number and the network interface, either of which can be defaulted. For example, the following YAML network configuration file shows all of the possible settings. Of course, normally you only need to specify those settings that are different from the default.

network:
    externalinterface: 192.168.0.100
    publicinterface: 192.168.0.200
    admin:
        address: 192.168.0.100
        port: 21211
    client:
        address: 192.168.0.100
        port: 21212
    drpublic:
        address: 192.168.0.100
        port: 5555
    internal:
        address: 192.168.0.100
        port: 3021
    metrics:
        address: 192.168.0.100
        port: 11781
    replication:
        address: 192.168.0.100
        port: 5555
    topics:
        address: 192.168.0.100
        port: 9092
    topicspublic:
        address: 192.168.0.100
        port: 9092
    zookeeper:
        address: 127.0.0.1
        port: 7181

When specifying network ports using individual command line arguments, you can optionally specify a unique network interface by preceding the port number with the interface's IP address (or hostname) followed by a colon. Specifying the port and/or network interface for an individual port setting overrides the default interface for that port, the interface set by --externalinterface or --internalinterface, and any properties defined in a YAML file specified using the --network qualifier.

The network configuration arguments to the voltdb start command are listed below. See the appendix on server configuration options in the VoltDB Administrator's Guide for more information about network configuration options.

--network={YAML-file}

Specifies a YAML file that defines the interfaces and port numbers of one or more network ports and interfaces.

--externalinterface={ip-address}

Specifies the default network interface to use for external ports, such as the admin and client ports.

--internalinterface ={ip-address}

Specifies the default network interface to use for internal communication, such as the internal port.

--publicinterface={ip-address}

Specifies the public network interface. This argument is useful for hosted systems where the internal and external interfaces may not be generally reachable from the Internet. In which case, specifying the public interface helps the Volt Management Center provide publicly accessible links for the cluster nodes.

--drpublic={ip-address}[:port-number]

Specifies the publicly advertised network interface and, optionally, port number for database replication (DR) communication. This is the address that is sent from the producer cluster to consumers. This argument is useful for hosted systems where the internal interfaces are not reachable from outside the hosted environment and the producer cluster must return an externally mapped port as the public DR interface to remote consumers.

--admin=[ip-address:]{port-number}

Specifies the admin port.

--client=[ip-address:]{port-number}

Specifies the client port.

--http=[ip-address:]{port-number}

Specifies the http port. The --http flag both sets the port number (and optionally the interface) and enables the http port, overriding the http setting, if any, in the configuration file.

--internal=[ip-address:]{port-number}

Specifies the internal port used to communicate between cluster nodes.

--metrics=[ip-address:]{port-number}

Specifies the metrics port used for distributing Prometheus-compliant metrics data.

--replication=[ip-address:]{port-number}

Specifies the replication port used for database replication. The --replication flag overrides the replication port setting in the configuration.

--topicsport=[ip-address:]{port-number}

Specifies the port used for receiving and sending topics data.

--topicspublic={ip-address}[:port-number]

Specifies the network address advertised as the public topics port. For cases where the server's interfaces are not accessible to external systems, the --topicspublic flag identifies a publicly accessible interface and, optionally, an alternative port number.

--zookeeper=[ip-address:]{port-number}

Specifies the zookeeper port. By default, the zookeeper port is bound to the server's internal interface (127.0.0.1).

Examples

The first example shows the commands for initializing and starting a three-node database cluster using three custom configuration files, common.yaml, boston.yaml, and users.yaml — and the node zeus as the host. This example demonstrates how multiple XDCR clusters could share a common set of configuration options, then have unique settings for, say, the XDCR settings (including the cluster-specific ID) and the user accounts for the cluster.

$ voltdb init --dir=~/mydb --config=configuration.yaml,boston.yaml,users.yaml
$ voltdb start --dir=~/mydb --count=3 --host=zeus

The second example takes advantage of the defaults for the host and configuration arguments to initialize and start a single-node database in the current directory.

$ voltdb init
$ voltdb start

The next example shows the use of the --force argument to re-initialize the directory used in the first example, to delete old data and set new configuration options from a different configuration file.

$ voltdb init --dir=~/mydb --config=newconfig.yaml --force


[] The init --force command deletes command logs and overflow subfolders within the database root directory. However, to avoid accidentally deleting backups, the snapshots subfolder is renamed rather than deleted. This way, it is possible to restore a snapshot in case of an unintended re-initialization. On the other hand, this means you should periodically check your database root directories and purge any archived snapshots folders (named snapshots.nn) that are no longer needed.