3.4. Sizing for Durability

Documentation

VoltDB Home » Documentation » Planning Guide

3.4. Sizing for Durability

Durability refers to the ability of a database to withstand — or recover from — unexpected events. VoltDB has several features that increase the durability of the database, including K-Safety, snapshots, command logging, and database replication

K-Safety replicates partitions to provide redundancy as a protection against server failure. Note that when you enable K-Safety, you are replicating the unique partitions across the available hardware. So the hardware resources — particularly servers and memory — for any one copy are being reduced. The easiest way to size hardware for a K-Safe cluster is to size the initial instance of the database, based on projected throughput and capacity, then multiply the number of servers by the number of replicas you desire (that is, the K-Safety value plus one).

Rule of Thumb

When using K-Safety, configure the number of cluster nodes as a whole multiple of the number of copies of the database (that is, K+1).

K-Safety has no real performance impact under normal conditions. However, the cluster configuration can affect performance when recovering from a failure. In a K-Safe cluster, when a failed server rejoins, it gets copies of all of its partitions from the other members of the cluster. The larger (in size of memory) the partitions are, the longer they can take to be restored. Since it is possible for the restore action to block database transactions, it is important to consider the trade off of a few large servers that are easier to manage against more small servers that can recover in less time.

Two of the other durability features — snapshots and command logs — have only a minimal impact on memory and processing power. However, these features do require persistent storage on disk.

Most VoltDB disk-based features, such as snapshots, export overflow, network partitions, and so on, can be supported on standard disk technology, such as SATA drives. They can also share space on a single disk, assuming the disk has sufficient capacity, since disk I/O is interleaved with other work.

Command logging, on the other hand, is time dependent and must keep up with the transactions on the server. The chapter on command logging in Using VoltDB discusses in detail the trade offs between asynchronous and synchronous logging and the appropriate hardware to use for each. But to summarize:

  • Use fast disks (such as battery-backed cache drives) for synchronous logging

  • Use SATA or other commodity drives for asynchronous logging. However, it is still a good idea to use a dedicated drive for the command logs to avoid concurrency issues between the logs and other disk activity.

Rule of Thumb

When using command logging, whether synchronous or asynchronous, use a dedicated drive for the command logs. Other disk activity (including command log snapshots) can share a separate drive.

Finally, database replication (DR) does not impact the sizing for memory or processing power of the servers. But it does require duplicates of the initial hardware for each additional cluster. For example, when using passive DR, you should double the estimated number of servers — one copy for the master cluster and one for the replica. When using cross datacenter replication (XDCR) you will need one complete copy for each of the clusters participating in the XDCR relationship.

Rule of Thumb

When using database replication, multiply the number of servers needed by the number of clusters involved — two for passive DR (master and replica); two or more to match the number of clusters in a XDCR environment.