There are times when it is necessary to save the contents of a VoltDB database to disk and then restore it. For example, if the cluster needs to be shut down for maintenance, you may want to save the current state of the database before shutting down the cluster and then restore the database once the cluster comes back online. Performing periodic backups of the data can also provide a fallback in case of unexpected failures — either physical failures, such as power outages, or logic errors where a client application mistakenly corrupts the database contents.
VoltDB provides shell commands, system procedures, and an automated snapshot feature that help you perform these operations. The following sections explain how to save and restore a running VoltDB cluster, either manually or automatically.
Manually saving and restoring a VoltDB database is useful when you need to do maintenance on the database itself or the cluster it runs on. The normal use of save and restore, when performing such a maintenance operation, is as follows:
Stop database activities (using pause).
Use save to write a snapshot of the current data to disk.
Shutdown the cluster.
Make changes to the VoltDB schema, cluster configuration, and/or deployment file as desired.
Restart the cluster in admin mode.
Reload the schema and stored procedures
Restore the previous snapshot.
Restart client activity (using resume).
The key is to make sure that all database activity is stopped before the save and shutdown are performed. This ensures that no further changes to the database are made (and therefore lost) after the save and before the shutdown. Similarly, it is important that no client activity starts until the database has started and the restore operation completes.
Save and restore operations are performed either by calling VoltDB system procedures or using the corresponding voltadmin shell commands. In most cases, the shell commands are simpler since they do not require program code to use. Therefore, this chapter uses voltadmin commands in the examples. If you are interested in programming the save and restore procedures, see Appendix G, System Procedures for more information about the corresponding system procedures.
When you issue a save command, you specify a path where the data will be saved and a unique identifier for tagging the files. VoltDB then saves the current data on each node of the cluster to a set of files at the specified location (using the unique identifier as a prefix to the file names). This set of files is referred to as a snapshot, since it contains a complete record of the database for a given point in time (when the save operation was performed).
--blocking option lets you specify whether
the save operation should block other transactions until it completes. In
the case of manual saves, it is a good idea to use this option since you
do not want additional changes made to the database during the save
Note that every node in the cluster uses the same absolute path, so the path specified must be valid, must exist on every node, and must not already contain data from any previous saves using the same unique identifier, or the save will fail.
When you issue a restore command, you specify the same absolute path and unique identifier used when creating the snapshot. VoltDB checks to make sure the appropriate save set exists on each node, then restores the data into memory.
To save the contents of a VoltDB database, use the voltadmin save command. The following example creates a snapshot at the path /tmp/voltdb/backup using the unique identifier TestSnapshot.
$ voltadmin save --blocking /tmp/voltdb/backup "TestSnapshot"
In this example, the command tells the save operation to block all other transactions until it completes. It is possible to save the contents without blocking other transactions (which is what automated snapshots do). However, when performing a manual save prior to shutting down, it is normal to block other transactions to ensure you save a known state of the database.
Note that it is possible for the save operation to succeed on some nodes of the cluster and not others. When you issue the voltadmin save command, VoltDB displays messages from each partition indicating the status of the save operation. If there are any issues that would stop the process from starting, such as a bad file path, they are displayed on the console. It is a good practice to examine these messages to make sure all partitions are saved as expected.
The easiest way to restore a snapshot is to let VoltDB do it for
you as part of the recover operation. If you are not changing the
cluster configuration you can use an automated snapshot or other
snapshot saved into the
directory by simply restarting the cluster nodes using the
voltdb recover command. With the recover action
VoltDB automatically starts and restores the most recent snapshot. This
approach has the added benefit that VoltDB automatically loads the
previous schema as well as part of the snapshot.
However, you cannot use voltdb recover to restore a snapshot or command log if the cluster configuration has changed, if you updated the VoltDB software itself, or if you want to restore an earlier snapshot or a snapshot stored in an alternate location. In these cases you must do a manual restore.
To manually restore a VoltDB database from a snapshot previously created by a save operation, you use the voltadmin restore command. You must specify the same pathname and unique identifier used during the save.
The following example restores the snapshot created by the example in Section 13.1.1.
$ voltadmin restore /tmp/voltdb/backup "TestSnapshot"
As with save operations, it is always a good idea to check the status information displayed by the command to ensure the operation completed as expected.
Between a save and a restore, it is possible to make changes to the the database and cluster configuration. You can:
Modify the schema and/or stored procedures
Add or remove nodes from the cluster
Change the number of sites per host
Change the K-safety value
To make these changes, you must make appropriate modifications to the schema, restart the cluster as an empty database, reload the schema and stored procedures, and then perform the restore. The following sections discuss these steps in more detail.
To add nodes to the cluster, use the following procedure:
Save the database.
Edit the deployment file, specifying the new number of nodes in the hostcount attribute of the <cluster> tag.
Restart the cluster (including the new nodes).
Reload the schema.
Issue a restore command.
When the snapshot is restored, the database (and partitions) are redistributed over the new cluster configuration.
It is also possible to remove nodes from the cluster using this procedure. However, to make sure that no data is lost in the process, you must copy the snapshot files from the nodes that are being removed to one of the nodes that is remaining in the cluster. This way, the restore operation can find and restore the data from partitions on the missing nodes.
To modify the database schema or stored procedures between a save and restore, make the appropriate changes to the source files (that is, the database DDL and the stored procedure Java source files). If you modify the stored procedures, be sure to repackage any Java stored procedures into a JAR file. Then you can:
Restart the cluster as an empty database.
Reload the schema.
Reload the stored procedures using the sqlcmd load classes directive.
Issue the restore command.
Two points to note when modifying the database structure before restoring a snapshot are:
When existing rows are restored to tables where new columns have been added, the new columns are filled with either the default value (if defined by the schema) or nulls.
When changing the datatypes of columns, it is possible to decrease the datatype size (for example, going from an INT to an TINYINT). However, if any existing values exceed the capacity of the new datatype (such as an integer value of 5,000 where the datatype has been changed to TINYINT), the entire restore will fail.
If you remove or modify stored procedures (particularly if you change the number and/or datatype of the parameters), you must make sure the corresponding changes are made to all client applications as well.