6.5. Managing XDCR Clusters

Documentation

VoltDB Home » Documentation » VoltDB Kubernetes Administrator's Guide

6.5. Managing XDCR Clusters

Once XDCR clusters are up and running, there are several management procedures that help keep the clusters in sync, especially when shutting down or removing clusters from the XDCR environment. In other environments, these procedures use voltadmin commands, such as shutdown, dr drop and dr reset. In Kubernetes, you execute these procedures through the VoltDB Operator using Helm properties. Activities include:

  • Removing a cluster temporarily

  • Removing a cluster permanently

  • Resetting XDCR when a cluster is lost

  • Rejoining a cluster that was removed

6.5.1. Removing a Cluster Temporarily

If you want to remove a cluster from the XDCR environment temporarily, you simply shutdown the cluster normally, by setting the number of replicas to zero. This way, when the cluster restarts, the command logs will take care of recovering all of the data and re-establishing the XDCR "conversations" with the other clusters:

--set cluster.clusterSpec.replicas=0

6.5.2. Removing a Cluster Permanently

If you want to remove a cluster from the XDCR environment permanently, you want to make sure it sends all of its completed transactions to the other clusters before it shuts down. You do this by setting the DR role to "none" to perform an orderly shutdown:

--set cluster.config.deployment.dr.role="none"
--set cluster.clusterSpec.replicas=0

Of course, you do not have to shut the cluster down. You can simply remove it from the XDCR environment. Note that if you do so, the data in the current cluster will diverge from those clusters still participating in XDCR. So only do this if you are sure you want to maintain a detached copy of the data:

--set cluster.config.deployment.dr.role="none"

Finally, if you cannot perform an orderly removal from XDCR — for example, if one of the other clusters is offline or if sending the outstanding transactions will take too long and you are willing to lose that data — you can set the property cluster.clusterSpec.dr.forceDrop to "TRUE" to force the cluster to drop out of the XDCR mesh without finalizing its XDCR transfers. Once the cluster has been removed, it is advisable to reset this property to "FALSE" so future procedures revert to the orderly approach of flushing the queues.

--set cluster.clusterSpec.dr.forceDrop=TRUE
--set cluster.config.deployment.dr.role="none"
--set cluster.clusterSpec.replicas=0
 . . .
--set cluster.clusterSpec.dr.forceDrop=FALSE

6.5.3. Resetting XDCR When a Cluster Leaves Unexpectedly

Normally, when a cluster is removed from XDCR in an orderly fashion, the other clusters are notified that the cluster has left the mesh. However, if a cluster leaves unexpectedly — for example, if it crashes or is shutdown and deleted without setting its role to "none" to notify the other clusters — the XDCR network still thinks the cluster is a member and may return. As a result, the remaining clusters continue to save DR logs for the missing member, using up unnecessary processing cycles and disk space. You need to reset the XDCR network mesh to correct this situation.

To reset the mesh you notify the remaining clusters that the missing cluster is no longer a member. You do this be adding the DR ID of the missing cluster to the cluster.clusterSpec.dr.excludeClusters property. The property value is an array of DR IDs. For example, if the DR ID (cluster.config.deployment.dr.id) of the lost cluster is "3", you set the property to "{3}":

--set cluster.clusterSpec.dr.excludeClusters='{3}'

You must set this property for all of the clusters remaining in the XDCR environment. If later, you want to add the missing cluster (or another cluster with the same DR ID) back into the XDCR mesh, you will need to reset this property. For example:

--set cluster.clusterSpec.dr.excludeClusters=null

6.5.4. Rejoining an XDCR Cluster That Was Previously Removed

If a cluster is removed from the XDCR cluster permanently, by resetting the DR role, or through exclusion by the other clusters, it is still possible to rejoin that cluster to the XDCR network. To do that you must reinitialize the cluster and, if it was forcibly excluded, remove the exclusion from the current members of the network. (Note, the following procedure is not necessary if the cluster was removed temporarily by setting the number of replicas to zero.)

First, if the cluster was forcibly removed by exclusion, you must remove the exclusion from the current members of the XDCR network by clearing the cluster.clusterSpec.dr.excludeClusters property (removing the missing cluster's ID from the array):

--set cluster.clusterSpec.dr.excludeClusters=null

Then you must restart the cluster you want to rejoin, reinitializing the cluster's contents with the cluster.clusterSpec.initForce property and setting the appropriate properties (such as the DR role and connection properties):

--set cluster.clusterSpec.initForce=TRUE
--set cluster.config.deployment.dr.role="xdcr"
--set cluster.clusterSpec.replicas=3

Once the cluster rejoins the XDCR network and synchronizes with the current members, be sure to reset the cluster.clusterSpec.initForce property to false.