4.3. Upgrading the Cluster

Documentation

VoltDB Home » Documentation » Administrator's Guide

4.3. Upgrading the Cluster

Sometimes you need to update or reconfigure the server infrastructure on which the VoltDB database is running. Server upgrades are one example. A server upgrade is when you need to fix or replace hardware, update the operating system, or otherwise modify the underlying system.

Server upgrades usually require stopping the VoltDB database process on the specific server being serviced. However, if your database cluster uses K-safety for enhanced availability, it is possible to complete server upgrades without any database downtime by performing a rolling hardware upgrade, where each server is upgraded in turn using the voltadmin stop and start commands.

Another type of upgrade is when you want to reconfigure the cluster as a whole. Reasons for reconfiguring the cluster are because you want to add or remove servers from the cluster or you need to modify the number of partitions per server that VoltDB uses.

Adding servers to the cluster can happen without stopping the database. This is called elastic scaling. Removing servers or changing the number of sites per host requires restarting the cluster during a maintenance window.

The following sections describe four methods of cluster upgrade:

  • Performing server upgrades

  • Performing rolling upgrades on K-safe clusters

  • Adding servers to a running cluster through elastic scaling

  • Reconfiguring the cluster with a maintenance window

4.3.1. Performing Server Upgrades

If you need to upgrade or replace the hardware or software (such as the operating system) of the individual servers, this can be done without taking down the database as a whole. As long as the server is running with a K-safety value of one or more, it is possible to take a server out of the cluster without stopping the database. You can then fix the server hardware, upgrade software (other than VoltDB), even replace the server entirely with a new server, then bring the server back into the cluster.

To perform a server upgrade:

  1. Stop the VoltDB server process on the server using the voltadmin stop command. As long as the cluster is K-safe, the rest of the cluster will continue running.

  2. Perform the necessary upgrades.

  3. Have the server rejoin the cluster using the voltdb start command.

The start command starts the database process on the server, contacts the database cluster, then copies the necessary partition content from other cluster nodes so the server can then participate as a full member of the cluster, While the server is rejoining, the other database servers remain accessible and actively process queries from client applications.

When rejoining a cluster you can use the same start command used when starting the cluster as a whole. If, however, you need to replace the server (say, for example, in the case of a disk failure), you will also need to initialize a root directory for the database process on the new machine. You do this using the current configuration file for the cluster. For example:

$ voltdb init --dir=~/database --config=deployment.xml
$ voltdb start --dir=~/database --host=svr1,svr2

If no changes have been made, you can use the same configuration file used to initialize the other servers. If you have used voltadmin update to change the configuration or changed settings using the VoltDB Management Center (VMC), you can download a copy of the latest configuration from VMC.

If the cluster is not K-safe — that is, the K-safety value is 0 — then you must follow the instructions in Section 4.3.4, “Reconfiguring the Cluster During a Maintenance Window” to upgrade the servers.

4.3.2. Performing Rolling Hardware Upgrades on K-Safe Clusters

If you need to upgrade all of the servers in a K-safe cluster (for example, if you are upgrading the operating system), you can perform a rolling hardware upgrade by stopping, upgrading, then rejoining each server one at a time. Using this process the entire cluster can be upgraded without suffering any downtime of the database. Just be sure to wait until the rejoining server has become a full member of the cluster before removing and upgrading the next server in the rotation. Specifically, wait until the following message appears in the log or on the console for the rejoining server:

Node rejoin completed. 

Alternately, you can attempt to connect to the server remotely — for example, using the sqlcmd command line utility. If your connection is rejected, the rejoin has not finished. If you successfully connect to the client port of the rejoining node, you know the rejoin is complete:

$ sqlcmd --servers=myserver
SQL Command :: myserver:21212
1>

Note

You cannot update the VoltDB software itself using the rolling hardware upgrade process, only the operating system, hardware, or other software. See Section 4.4, “Upgrading VoltDB Software” for information about minimizing downtime during a VoltDB software upgrade.

4.3.3. Adding Servers to a Running Cluster with Elastic Scaling

If you want to add servers to a VoltDB cluster — usually to increase performance and/or capacity — you can do this without having to restart the database. You add servers to the cluster using the voltdb start command with the --add flag. Note, as always, you must initialize a root directory before issuing the start command. For example:

$ voltdb init  --dir=~/database --config=deployment.xml
$ voltdb start --dir=~/database --host=svr1,svr2 --add

The --add flag specifies that if the cluster full — that is, all of the specified number of servers are currently active in the cluster — the joining node can be added to elastically expand the cluster. You must elastically add a full complement of servers to match the K-safety value (K+1) before the servers can participate as active members of the cluster. For example, if the K-safety value is 2, you must add 3 servers before they actually become part of the cluster and the cluster rebalances its partitions.

When you add servers to a VoltDB database, the cluster performs the following actions:

  1. The new servers are added to the cluster configuration and sent copies of the schema, stored procedures, and deployment file.

  2. Once sufficient servers are added, copies of all replicated tables and their share of the partitioned tables are sent to the new servers.

  3. As the data is rebalanced, the new servers begin processing transactions for the partition content they have received.

  4. Once rebalancing is complete, the new servers are full members of the cluster.

If the cluster is not at its full complement of servers when you issue a voltdb start --add command, the added server will join the cluster as a replacement for a missing node rather than extending the cluster. Once the cluster is back to its full complement of nodes, the next voltdb start --add command will extend the cluster.

4.3.4. Reconfiguring the Cluster During a Maintenance Window

If you want to remove servers from the cluster permanently (as opposed to temporarily removing them for maintenance as described in Section 4.3, “Upgrading the Cluster”) or you want to change other cluster-wide attributes, such as the number of partitions per server, you need to restart the database cluster as a whole. Stopping the database temporarily to perform this sort of reconfiguration is known as a maintenance window.

The steps for reconfiguring the cluster with a maintenance window are:

  1. Place the database in admin mode (voltadmin pause).

  2. Perform a manual snapshot of the database (voltadmin save --blocking).

  3. Shutdown the database (voltadmin shutdown).

  4. Make the necessary changes to the configuration file.

  5. Reinitialize the database root directory on all nodes specifying the edited configuration file (voltdb init --force).

  6. Start the new database in admin mode ( voltdb start --pause)

  7. Restore the snapshot created in Step #2 (voltadmin restore).

  8. Return the database to normal operations (voltadmin resume).