When using K-safety, it is possible for one or more nodes in a cluster to stop without stopping the database itself. (See the chapter on availability in the Using VoltDB manual for a complete description of K-safety.) If a server stops — either intentionally or accidentally — you can start the server and have it rejoin the cluster using the same voltdb start command used to start the cluster. For example:
$ voltdb start --dir=~/database \ --count=5 \ --host=svr1,svr2
The start command will check to see if the cluster is still running, based on the list of servers in the
--host
argument. If so, the server will rejoin the cluster.
Note that if there are multiple servers listed in the --host
argument, the server can rejoin even
if it is one of the listed hosts. If you only list one host and that is the server that stopped, you will need to list a
different server in the --host
argument — any server that is still an active member of the running
cluster. (This is why listing multiple nodes in the --host
argument is beneficial: you can use exactly
the same start command in multiple situations.)
If you want to stop a single node in a K-safe cluster — for example, to perform maintenance on the hardware — you can do this using the voltadmin stop command. The voltadmin stop command stops a single node, as long as the cluster has enough K-safety to remain viable after the nodes stops. (If not, the stop command is rejected.) For example to stop svr2, you can issue the following command:
$ voltadmin stop --host=svr1 svr2
Note that the stop command does not have to issued on the server that is being stopped. You can issue the command on any active server in the cluster. See Chapter 4, Maintenance and Upgrades for more information about performing maintenance tasks.