There are times when it is useful to create multiple copies of a database. Not just a snapshot of a moment in time, but live, constantly updated copies.
K-safety maintains redundant copies of partitions within a single VoltDB database, which helps protect the database cluster against individual node failure. Database replication also creates a copy. However, database replication creates and maintains copies in separate, often remote, databases.
VoltDB supports two forms of database replication:
One-way (Passive)
Two-way (Cross Datacenter)
Passive replication copies the contents from one database, known as the master database, to the other, known as the replica. In passive replication, replication occurs in one direction: from the master to the replica. Clients can connect to the master database and perform all normal database operations, including INSERT, UPDATE, and DELETE statements. As shown in Figure 11.1, “Passive Database Replication” changes are copied from the master to the replica. To ensure consistency between the two databases, the replica is started as a read-only database, where only transactions replicated from the master can modify the database contents.
Cross Datacenter Replication (XDCR), or active replication, copies changes in both directions. XDCR can be set up on multiple clusters (not just two). Client applications can then perform read/write operations on any of the participating clusters and changes in one database are then copied and applied to all the other databases. Figure 11.2, “Cross Datacenter Replication” shows how XDCR can support client applications attached to each database instance.
Database replication (DR) provides two key business advantages. The first is protecting your business data against catastrophic events, such as power outages or natural disasters, which could take down an entire cluster. This is often referred to as disaster recovery. Because the clusters can be in different geographic locations, both passive DR and XDCR allow other clusters to continue unaffected when one becomes inoperable. Because the replica is available for read-only transactions, passive DR also allows you to offload read-only workloads, such as reporting, from the main database instance.
The second business issue that DR addresses is the need to maintain separate, active copies of the database in separate locations. For example, XDCR allows you to maintain copies of a product inventory database at two or more separate warehouses, close to the applications that need the data. This feature makes it possible to support massive numbers of clients that could not be supported by a single database instance or might result in unacceptable latency when the database and the users are geographically separated. The databases can even reside on separate continents.
It is important to note, however, that database replication is not instantaneous. The transactions are committed locally, then copied to the other database or databases. So when using XDCR to maintain multiple active clusters you must be careful to design your applications to avoid possible conflicts when transactions change the same record in two databases at approximately the same time. See Section 11.3.7, “Understanding Conflict Resolution” for more information about conflict resolution.
The remainder of this chapter discusses the following topics:
Database replication (DR) involves duplicating the contents of selected tables between two database clusters. In passive DR, the contents are copied in one direction: from master to replica. In active or cross datacenter DR, changes are copied in both directions.
You identify which tables to replicate in the schema, by specifying the table name in a DR TABLE statement. For example, to replicate all tables in the voter sample application, you would execute three DR TABLE statements when defining the database schema:
DR TABLE contestants; DR TABLE votes; DR TABLE area_code_state;
You enable DR by including the <dr>
tag in the configuration files when
initializing the database. The <dr>
element identifies three pieces of
information:
A unique cluster ID for each database. The ID is required and can be any number between 0 and 127, as long as each cluster has a different ID.
The role the cluster plays, whether master, replica, or xdcr. The default is master.
For the replica and xdcr roles, a connection source listing the host name or IP address of one or more nodes from the other databases.
For example:
<dr id="2" role="replica"> <connection source="serverA1,serverA2" /> </dr>
Each cluster must have a unique ID. For passive DR, only the replica needs a <connection>
element, since replication occurs in only one direction.
For cross datacenter replication (XDCR), all clusters must include the <connection>
element pointing to at each one other cluster. If you are establishing an
XDCR network with multiple clusters, the <connection>
tag can specify hosts from
one or more of the other clusters. The participating clusters will coordinate establishing the correct connections, even
if the <connection>
element does not list them all.
Note that for XDCR, you must specify the attribute role="xdcr"
before starting each cluster. You
cannot mix active and passive DR in the same database group.
For passive DR, you must start the replica database with the role="replica"
attribute to ensure
the replica is in read-only mode. Once the clusters are configured properly and the schema of the DR tables match in the
databases, replication starts.
The actual replication process is performed in multiple parallel streams; each unique partition on one cluster sends a binary log of completed transactions to the other clusters. Replicating by partition has two key advantages:
The process is faster — Because the replication process uses a binary log of the results of the transaction (rather than the transaction itself), the receiving cluster (or consumer) does not need to reprocess the transaction; it simply applies the results. Also, since each partition replicates autonomously, multiple streams of data are processed in parallel, significantly increasing throughout.
The process is more durable — In a K-safe environment, if a server fails on a DR cluster, individual partition streams can be redirected to other nodes or a stream can wait for the server to rejoin — without interfering with the replication of the other partitions.
If data already exists in one of the clusters before database replication starts for the first time, that database sends a snapshot of the existing data to the other, as shown in Figure 11.3, “Replicating an Existing Database”. Once the snapshot is received and applied (and the two clusters are in sync), the partitions start sending binary logs of transaction results to keep the clusters synchronized.
For passive DR, only the master database can have existing data before starting replication for the first time. The replica's DR tables must be empty. For XDCR, the first database that is started can have data in the DR tables. If other clusters contain data, replication cannot start. Once DR has started, the databases can stop and recover using command logging without having to restart DR from the beginning.
Once replication begins, the DR process is designed to withstand normal failures and operational downtime. When using K-safety, if a node fails on any cluster, you can rejoin the node (or a replacement) using the voltdb start command without breaking replication. Similarly, if a cluster shuts down, you can use voltdb start to restart the database and restart replication where it left off. The ability to restart DR assumes you are using command logging. Specifically, synchronous command logging is recommended to ensure complete durability.
If unforeseen events occur that make a database unreachable, database replication lets you replace the missing database with its copy. This process is known as disaster recovery. For cross datacenter replication (XDCR), you simply need to redirect your client applications to the remaining cluster(s). For passive DR, there is an extra step. To replace the master database with the replica, you must issue the voltadmin promote command on the replica to switch it from read-only mode to a fully operational database.
See Section 11.2.5.3, “Promoting the Replica When the Master Becomes Unavailable” for more information on promoting the replica database.
It is important to note that, unlike K-safety where multiple copies of each partition are updated simultaneously, database replication involves shipping the results of completed transactions from one database to another. Because replication happens after the fact, there is no guarantee that the contents of the clusters are identical at any given point in time. Instead, the receiving database (or consumer) "catches up" with the sending database (or producer) after the binary logs are received and applied by each partition.
Also, because DR occurs on a per partition basis, changes to partitions may not occur in the same order on the consumer, since one partition may replicate faster than another. Normally this is not a problem because the results of all transactions are atomic in the binary log. However, if the producer cluster crashes, there is no guarantee that the consumer has managed to retrieve all the logs that were queued. Therefore, it is possible that some transactions that completed on the producer are not reflected on the consumer.
Fortunately, using command logging, when you restart the failed cluster, any unacknowledged transactions will be replayed from the failed cluster's disk-based DR cache, allowing the clusters to recover and resume DR where they left off. However, if the failed cluster does not recover, you will need to decide how to proceed. You can choose to restart DR from scratch or, if you are using passive DR, you can promote the replica to replace the master.
To ensure effective recovery, the use of synchronous command logging is recommended for DR. Synchronous command logging guarantees that all transactions are recorded in the command log and no transactions are lost. If you use asynchronous command logging, there is a possibility that a binary log is applied but not captured by the command log before the cluster crashes. Then when the database recovers, the clusters will not agree on the last acknowledged DR transaction, and DR will not be able to resume.
The decision whether to promote the replica or wait for the master to return (and hopefully recover all transactions from the command log) is not an easy one. Promoting the replica and using it to replace the original master may involve losing one or more transactions per partition. However, if the master cannot be recovered or cannot not be recovered quickly, waiting for the master to return can result in significant business loss or interruption.
Your own business requirements and the specific situation that caused the outage will determine which choice to make — whether to wait for the failed cluster to recover or to continue operations on the remaining cluster only. The important point is that database replication makes the choice possible and significantly eases the dangers of unforeseen events.