3.7. Defining the Cluster Configuration

Two important aspects of a VoltDB database are the physical layout of the cluster that runs the database and the database features you choose to use. You define the physical cluster layout on the voltdb start command using the --count and --host arguments. You enable and disable specific database features in configuration files when you initialize the database root directory with the voltdb init command.

In the simplest case — when running on a single node with no configuration specified — VoltDB defaults to eight execution sites per host, and a K-safety value of zero. You can customize the database by specifying options in one or more YAML configuration files when you initialize the database with the voltdb init command and the --config (or -C) qualifier. You can put all of the configuration properties in a single file, or you can modularize the configuration into separate files for individual topics. For example, the following command customizes the database using separate YAML files for common properties, directory paths, and security:

$ voltdb init --config=common.yaml,paths.yaml,security.yaml

Configuration files are in YAML format, where options are specified as a hierarchy of properties with each element of the hierarchy indented on a separate line and terminated by a colon. The actual property values follow the colon. For example:

deployment:
  cluster:
    sitesperhost: 12
    kfactor: 1

In the preceding example, the child properties of the deployment.cluster element define the layout of the database partitions, including:

sitesperhost — specifies the number of partitions created on each server in the cluster. The sitesperhost value times the number of servers gives you the total number of partitions in the cluster. See Section 3.7.1, “Determining How Many Sites per Host” for more information about partition count.
kfactor — specifies the K-safety value to use for durability when creating the database. The K-safety value controls the duplication of database partitions. See Chapter 11, Availability for more information about K-safety.

Configuration files also enable and configure many runtime options related to the database, which are described later in this book. For example, the configuration file can specify:

Whether security is enabled and what users and passwords are needed to authenticate clients at runtime. See Chapter 13, Security for more information.
A schedule for saving automatic snapshots of the database. See Section 14.2, “Scheduling Automated Snapshots”.
Properties for exporting and importing data to other data sources. See Chapter 16, Streaming Data: Import, Export, and Migration.

For the complete list of properties and YAML file syntax, see Appendix E, YAML Configuration Properties.

3.7.1. Determining How Many Sites per Host

There is very little penalty for allocating more sites than needed for the partitions the database will use (except for incremental memory usage). Consequently, VoltDB defaults to eight sites per node to provide reasonable performance on most modern system configurations. This default does not normally need to be changed. However, for systems with a large number of available processors (16 or more) or older machines with fewer than 8 processors and limited memory, you may wish to tune the sitesperhost property.

The number of sites needed per node is related to the number of processor cores each system has, the optimal number being approximately 3/4 of the number of CPUs reported by the operating system. For example, if you are using a cluster of dual quad-core processors (in other words, 8 cores per node), the optimal number of partitions is likely to be 6 or 7 sites per node.

deployment:
  cluster:
    sitesperhost: 6

For systems that support hyperthreading (where the number of physical cores support twice as many threads), the operating system reports twice the number of physical cores. In other words, a dual quad-core system would report 16 virtual CPUs. However, each partition is not quite as efficient as on non- hyperthreading systems. So the optimal number of sites is more likely to be between 10 and 12 per node in this situation.

Because there are no hard and set rules, the optimal number of sites per node is best calculated by actually benchmarking the application to see what combination of cores and sites produces the best results. However, it is important to remember that all nodes in the cluster will use the same number of sites. So the best performance is achieved by using a cluster with all nodes having the same physical architecture (i.e. cores).

3.7.2. Configuring Paths for Runtime Features

An important aspect of some runtime features is that they make use of disk resources for persistent storage across sessions. For example, automatic snapshots need a directory for storing snapshots of the database contents. Similarly, export uses disk storage for writing overflow data if the export connector cannot keep up with the export queue.

You can specify individual paths for each feature in the configuration. If not, VoltDB creates subfolders for each feature in the database root directory as needed, which can be useful for testing. However, in production, it is useful to direct certain high volume features, such as command logging, to separate devices to avoid disk I/O affecting database performance.

You can identify specific path locations for the following features using the paths property:

Command logging (deployment.paths.commandlog)
Command log snapshots(deployment.paths.commandlogsnapshot)
Export overflow (deployment.paths.exportoverflow)
Snapshots (deployment.paths.snapshots)

If you specify a relative rather than an absolute path, it is relative to the database root directory. If you name a specific feature path and it does not exist, VoltDB attempts to create it for you. For example, the export overflow path contains temporary data which can be deleted periodically. The following configuration file specifies /opt/overflow as the directory for export overflow.

deployment:
  paths:
    exportoverflow:
      path: "/opt/overflow"

3.7.3. Verifying your Hardware Configuration

The configuration files and start command options define the desired configuration of your database cluster. However, there are several important aspects of the physical hardware and operating system configuration that you should be aware of before running VoltDB:

VoltDB can operate on heterogeneous clusters. However, best performance is achieved by running the cluster on similar hardware with the same type of processors, number of processors, and amount of memory on each node.
All nodes must be able to resolve the IP addresses and host names of the other nodes in the cluster. That means they must all have valid DNS entries or have the appropriate entries in their local hosts file.
You must run a time synchronization service such as Network Time Protocol (NTP) or chrony on all of the cluster nodes, preferably synchronizing against the same local time server. If the time skew between nodes in the cluster is greater than 200 milliseconds, VoltDB cannot start the database.
It is strongly recommended that you configure your time service to avoid adjusting time backwards. For example, in NTP this is done using the -x argument. If the server time moves backward, VoltDB must pause and wait for time to catch up.

Using VoltDB