4.3. Upgrading the Cluster

Documentation

VoltDB Home » Documentation » Administrator's Guide

4.3. Upgrading the Cluster

Sometimes you need to update or reconfigure the server infrastructure on which the VoltDB database is running. Server upgrades are one example. A server upgrade is when you need to fix or replace hardware, update the operating system, or otherwise modify the underlying system.

Server upgrades usually require stopping the VoltDB database process on the specific server being serviced. However, if your database cluster uses K-safety for enhanced availability, it is possible to complete server upgrades without any database downtime by performing a rolling hardware upgrade, where each server is upgraded in turn using the voltadmin stop and start commands.

Another type of upgrade is when you want to reconfigure the cluster as a whole. Reasons for reconfiguring the cluster are because you want to add or remove servers from the cluster or you need to modify the number of partitions per server that VoltDB uses.

Adding and removing servers from the cluster can happen without stopping the database. This is called elastic scaling. Changing the K-Safety factor or number of sites per host requires restarting the cluster during a maintenance window.

The following sections describe five methods of cluster upgrade:

  • Performing server upgrades

  • Performing rolling upgrades on K-safe clusters

  • Adding servers to a running cluster through elastic scaling

  • Removing servers from a running cluster through elastic scaling

  • Reconfiguring the cluster with a maintenance window

4.3.1. Performing Server Upgrades

If you need to upgrade or replace the hardware or software (such as the operating system) of the individual servers, this can be done without taking down the database as a whole. As long as the server is running with a K-safety value of one or more, it is possible to take a server out of the cluster without stopping the database. You can then fix the server hardware, upgrade software (other than VoltDB), even replace the server entirely with a new server, then bring the server back into the cluster.

To perform a server upgrade:

  1. Stop the VoltDB server process on the server using the voltadmin stop command. As long as the cluster is K-safe, the rest of the cluster will continue running.

  2. Perform the necessary upgrades.

  3. Have the server rejoin the cluster using the voltdb start command.

The start command starts the database process on the server, contacts the database cluster, then copies the necessary partition content from other cluster nodes so the server can then participate as a full member of the cluster, While the server is rejoining, the other database servers remain accessible and actively process queries from client applications.

When rejoining a cluster you can use the same start command used when starting the cluster as a whole. If, however, you need to replace the server (say, for example, in the case of a disk failure), you will also need to initialize a root directory for the database process on the new machine. You do this using the current configuration file for the cluster. For example:

$ voltdb init --dir=~/database --config=deployment.xml
$ voltdb start --dir=~/database --host=svr1,svr2

If no changes have been made, you can use the same configuration file used to initialize the other servers. If you have used voltadmin update to change the configuration or changed settings using the VoltDB Management Center (VMC), you can download a copy of the latest configuration from VMC.

If the cluster is not K-safe — that is, the K-safety value is 0 — then you must follow the instructions in Section 4.3.5, “Reconfiguring the Cluster During a Maintenance Window” to upgrade the servers.

4.3.2. Performing Rolling Hardware Upgrades on K-Safe Clusters

If you need to upgrade all of the servers in a K-safe cluster (for example, if you are upgrading the operating system), you can perform a rolling hardware upgrade by stopping, upgrading, then rejoining each server one at a time. Using this process the entire cluster can be upgraded without suffering any downtime of the database. Just be sure to wait until the rejoining server has become a full member of the cluster before removing and upgrading the next server in the rotation. Specifically, wait until the following message appears in the log or on the console for the rejoining server:

Node rejoin completed. 

Alternately, you can attempt to connect to the server remotely — for example, using the sqlcmd command line utility. If your connection is rejected, the rejoin has not finished. If you successfully connect to the client port of the rejoining node, you know the rejoin is complete:

$ sqlcmd --servers=myserver
SQL Command :: myserver:21212
1>

Note

You cannot update the VoltDB software itself using the rolling hardware upgrade process, only the operating system, hardware, or other software. See Section 4.4, “Upgrading VoltDB Software” for information about minimizing downtime during a VoltDB software upgrade.

4.3.3. Adding Servers to a Running Cluster with Elastic Scaling

If you want to add servers to a VoltDB cluster — usually to increase performance and/or capacity — you can do this without having to restart the database. You add servers to the cluster using the voltdb start command with the --add flag. Note, as always, you must initialize a root directory before issuing the start command. For example:

$ voltdb init  --dir=~/database --config=deployment.xml
$ voltdb start --dir=~/database --host=svr1,svr2 --add

The --add flag specifies that if the cluster full — that is, all of the specified number of servers are currently active in the cluster — the joining node can be added to elastically expand the cluster. You must elastically add a full complement of servers to match the K-safety value (K+1) before the servers can participate as active members of the cluster. For example, if the K-safety value is 2, you must add 3 servers before they actually become part of the cluster and the cluster rebalances its partitions.

When you add servers to a VoltDB database, the cluster performs the following actions:

  1. The new servers are added to the cluster configuration and sent copies of the schema, stored procedures, and deployment file.

  2. Once sufficient servers are added, copies of all replicated tables and their share of the partitioned tables are sent to the new servers.

  3. As the data is rebalanced, the new servers begin processing transactions for the partition content they have received.

  4. Once rebalancing is complete, the new servers are full members of the cluster.

If the cluster is not at its full complement of servers when you issue a voltdb start --add command, the added server will join the cluster as a replacement for a missing node rather than extending the cluster. Once the cluster is back to its full complement of nodes, the next voltdb start --add command will extend the cluster.

4.3.4. Removing Servers from a Running Cluster with Elastic Scaling

Just as you can add nodes to a running cluster to add capacity, you can remove nodes from a running cluster to reduce capacity. Obviously, you want to make sure that the smaller cluster has sufficient resources, such as memory, for your data and workload. If you are using K-safety, you also need to be sure the current cluster is large enough to remove nodes and still meet the requirements for your specific K-safety setting.

To remove nodes from a running cluster, you use the voltadmin resize command. The first step is to verify that the cluster has enough nodes to reduce in size. You do this with the voltadmin resize --test command:

$ voltadmin resize --test

The voltadmin resize --test command checks the cluster to make sure there are enough nodes to still be operational after the reduction and it reports which nodes will be removed as a result of the operation. The number of nodes that will be removed is calculated as the smallest number that allows the cluster to maintain K-safety. Without K-Safety, that is one node. With K-Safety, that is at least K+1, but possibly more depending on the cluster configuration. The remaining node count and configuration must satisfy the requirement that the number of nodes and the total number of partitions are both divisible by K+1.

Once you are ready to start reducing the cluster size, issue the voltadmin resize command without any arguments:

$ voltadmin resize

This command verifies that the cluster can be resized, reports which nodes will be removed, asks you to confirm that you want to begin, and then starts the resize operation. Because resizing the cluster involves reorganizing and rebalancing the partitions, it can take a significant amount of time, depending on the size of the database and the ongoing workload. You can track the progress of the resize operation using the voltadmin status command. You can also adjust the priority between rebalancing the partitions and ongoing client transactions by setting the duration and throughput of the rebalance operation. See the section on "Configuring How VoltDB Rebalances Nodes During Elastic Scaling" in the Using VoltDB manual for details.

Note that once resizing starts, you cannot cancel the operation. So be certain you want to reduce the size of the cluster before beginning. If for any reason the resize operation fails unexpectedly, you can use the voltadmin resize --retry command to restart the cluster reduction.

4.3.5. Reconfiguring the Cluster During a Maintenance Window

If you want to modify the cluster configuration, such as the number of sites per host or K-Safety factor, you need to restart the database cluster as a whole. You can also choose to add or remove nodes from the cluster during this operation. Stopping the database temporarily to reconfigure the cluster is known as a maintenance window.

The steps for reconfiguring the cluster with a maintenance window are:

  1. Place the database in admin mode (voltadmin pause).

  2. Perform a manual snapshot of the database (voltadmin save --blocking).

  3. Shutdown the database (voltadmin shutdown).

  4. Make the necessary changes to the configuration file.

  5. Reinitialize the database root directory on all nodes specifying the edited configuration file (voltdb init --force).

  6. Start the new database in admin mode ( voltdb start --pause)

  7. Restore the snapshot created in Step #2 (voltadmin restore).

  8. Return the database to normal operations (voltadmin resume).