4.2. Distributing Data in a Cluster

In the simplest case, a single server, the sum of the sizing calculations in the previous section gives you an accurate estimate of the memory required for the database content. However, VoltDB scales best in a cluster environment. In that case, you need to determine how much of the data will be handled by each server, which is affected by the number of servers, the number and size of partitioned and replicated tables, and the level of availability, or K-safety, required.

The following sections explain how to determine the distribution (and therefore sizing) of partitioned and replicated tables in a cluster and the impact of K-safety.

4.2.1. Data Memory Usage in Clusters

Although it is tempting to simply divide the total memory required by the database content by the number of servers, this is not an accurate formula for two reasons:

Not all data is partitioned. Replicated tables (and their indexes) appear on all servers.
Few if any partitioning schemes provide perfectly even distribution. It is important to account for some variation in the distribution.

To accurately calculate the memory usage per server in a cluster, you must account for all replicated tables and indexes plus each server's portion of the partitioned tables and indexes.

 Data per server = replicated tables + (partitioned tables/number of servers)

Using the sample sizing for the Flight database described in Section 4.1.3, “An Example of Database Sizing”, the total memory required for the replicated tables and indexes (for the Flight and Airport tables) is only approximately 12 megabytes. The memory required for the remaining partitioned tables and indexes is approximately 490 megabytes. Assuming the database is run on two servers, the total memory required for data on each server is approximately 256 megabytes:

          12   Replicated data
    2          Number of servers
  490    245   Paritioned data total / per server
         256   Total

Of course, no partitioning scheme is perfect. So it is a good idea to provide additional space (say 20% to 30%) to allow for any imbalance in the partitioning.

4.2.2. Memory Requirements for High Availability (K-Safety)

The features you plan to use with a VoltDB database also impact capacity planning, most notably K-Safety. K-Safety improves availability by replicating data across the cluster, allowing the database to survive individual node failures.

Because K-Safety involves replication, it also increases the memory requirements for storing the replicated data. Perhaps the easiest way to size a K-Safe cluster is to size a non-replicated cluster, then multiply by the K-Safety value plus one.

For example, let's assume you plan to run a database with a K-Safety value of 2 (in other words, three copies) on a 6-node cluster. The easiest way to determine the required memory capacity per server is to calculate the memory requirements for a 2-node (non K-Safe) cluster, then create three copies of that hardware and memory configuration.

Planning Guide