There are times when it is useful to create multiple copies of a database. Not just a snapshot of a moment in time, but live, constantly updated copies.
K-safety maintains redundant copies of partitions within a single VoltDB database, which helps protect the database cluster against individual node failure. Database replication also creates a copy. However, database replication creates and maintains copies in separate, often remote, databases.
VoltDB supports two forms of database replication:
Two-way (Cross Datacenter)
Passive replication copies the contents from one database, known as the master database, to the other, known as the replica. In passive replication, replication occurs in one direction: from the master to the replica. Clients can connect to the master database and perform all normal database operations, including INSERT, UPDATE, and DELETE statements. As shown in Figure 11.1, “Passive Database Replication” changes are copied from the master to the replica. To ensure consistency between the two databases, the replica is started as a read-only database, where only transactions replicated from the master can modify the database contents.
Cross Datacenter Replication (XDCR), or active replication, copies changes in both directions. It is possible for client applications to perform read/write operations on either cluster and changes in one database are then copied and applied to the other database. Figure 11.2, “Cross Datacenter Replication” shows how XDCR can support client applications attached to each database instance.
Database replication (DR) provides two key business advantages. The first is protecting your business data against catastrophic events, such as power outages or natural disasters, which could take down an entire cluster. This is often referred to as disaster recovery. Because the two clusters can be in different geographic locations, both passive DR and XDCR allow one of the clusters to continue unaffected when the other becomes inoperable. Because the replica is available for read-only transactions, passive DR also allows you to offload read-only workloads, such as reporting, from the main database instance.
The second business issue that DR addresses is the need to maintain separate, active copies of the database in two separate locations. For example, XDCR allows you to maintain copies of a product inventory database at two separate warehouses, close to the applications that need the data. This feature makes it possible to support massive numbers of clients that could not be supported by a single database instance or might result in unacceptable latency when the database and the users are geographically separated. The databases can even reside on separate continents.
It is important to note, however, that database replication is not instantaneous. The transactions are committed locally, then copied to the other database. So when using XDCR to maintain two active clusters you must be careful to design your applications to avoid possible conflicts when transactions change the same record in the two databases at approximately the same time. See Section 11.3.6, “Understanding Conflict Resolution” for more information about conflict resolution.
The remainder of this chapter discusses the following topics:
Database replication (DR) involves duplicating the contents of selected tables between two database clusters. In passive DR, the contents are copied in one direction: from master to replica. In active or cross datacenter DR, changes are copied in both directions.
You identify which tables to replicate in the schema, by specifying the table name in a DR TABLE statement. For example, to replicate all tables in the voter sample application, you would execute three DR TABLE statements when defining the database schema:
DR TABLE contestants; DR TABLE votes; DR TABLE area_code_state;
You enable DR by including the
<dr> tag in the deployment files of the two
<dr> element identifies the unique cluster ID for each database (a
number between 0 and 127) and the connection source of replication as the host name or IP address of one or more nodes
from the other producer database. For example:
<dr id="2"> <connection source="serverA1,serverA2" /> </dr>
Each cluster must have a unique ID. For passive DR, only the replica needs a
<connection> element, since replication occurs in only one direction. For active or cross
datacenter replication (XDCR), both clusters must include the
element pointing at each other.
Finally, for XDCR, you must include the DDL statement
SET DR=ACTIVE; as part of the schema on
both clusters before DR begins. For passive DR, you must start the replica database with the
flag on the command line to ensure the replica is in read-only mode. Once the clusters are configured properly and the
schema of the DR tables match in both databases, replication starts.
The actual replication process is performed in multiple parallel streams; each unique partition on one cluster sends a binary log of completed transactions to the other cluster. Replicating by partition has two key advantages:
The process is faster — Because the replication process uses a binary log of the results of the transaction (rather than the transaction itself), the receiving cluster (or consumer) does not need to reprocess the transaction; it simply applies the results. Also, since each partition replicates autonomously, multiple streams of data are processed in parallel, significantly increasing throughout.
The process is more durable — In a K-safe environment, if a server fails on either cluster, individual partition streams can be redirected to other nodes or a stream can wait for the server to rejoin — without interfering with the replication of the other partitions.
If data already exists in one of the clusters before database replication starts for the first time, that database sends a snapshot of the existing data to the other, as shown in Figure 11.3, “Replicating an Existing Database”. Once the snapshot is received and applied (and the two clusters are in sync), the partitions start sending binary logs of transaction results to keep the clusters synchronized.
For passive DR, only the master database can have existing data before starting replication for the first time. The replica's DR tables must be empty. For XDCR, only one of the two databases can have data in the DR tables. If both clusters contain data, replication cannot start. Once DR has started, the databases can stop and recover using command logging without having to restart DR from the beginning.
Once replication begins, the DR process is designed to withstand normal failures and operational downtime. When using K-safety, if a node fails on either cluster, you can rejoin the node (or a replacement) using the voltdb rejoin command without breaking replication. Similarly, if either cluster shuts down, you can use voltdb recover to restart the database and restart replication where it left off. The ability to restart DR using recover assumes you are using command logging. Specifically, synchronous command logging is recommended to ensure complete durability.
If unforeseen events occur that make either database unreachable, database replication lets you replace the missing database with its copy. This process is known as disaster recovery. For cross datacenter replication (XDCR), you simply need to redirect your client applications to the remaining cluster. For passive DR, there is an extra step. To replace the master database with the replica, you must issue the voltadmin promote command on the replica to switch it from read-only mode to a fully operational database.
See Section 126.96.36.199, “Promoting the Replica When the Master Becomes Unavailable” for more information on promoting the replica database.
It is important to note that, unlike K-safety where multiple copies of each partition are updated simultaneously, database replication involves shipping the results of completed transactions from one database to another. Because replication happens after the fact, there is no guarantee that the contents of the two clusters are identical at any given point in time. Instead, the receiving database (or consumer) "catches up" with the sending database (or producer) after the binary logs are received and applied by each partition.
Also, because DR occurs on a per partition basis, changes to partitions may not occur in the same order on the consumer, since one partition may replicate faster than another. Normally this is not a problem because the results of all single-partitioned transactions are atomic in the binary log. Also, any changes to replicated tables are handled as atomic in the binary logs, to ensure all copies of the table on the consumer remain consistent. However, changes to partitioned tables from within a multi-partitioned transaction will result in separate logs that can arrive at the consumer's partitions at different times.
If the producer cluster crashes, there is no guarantee that the consumer has managed to retrieve all the logs that were queued. Therefore, it is possible that some transactions that completed on the producer are not reflected on the consumer. More importantly, if any multi-partitioned transactions update partitioned tables, you should be aware of the possibility that all of the results of that transaction did not arrive simultaneously. You may need to check the contents of the consumer to see if any such transactions were interrupted in flight.
Fortunately, using command logging and the voltdb recover command to restart the failed cluster, any unacknowledged transactions will be replayed from the failed cluster's disk-based DR cache, allowing the two clusters to recover and resume DR where they left off. However, if the failed cluster does not recover, you will need to decide how to proceed. You can choose to restart DR from scratch or, if you are using passive DR, you can promote the replica to replace the master.
To ensure effective recovery, the use of synchronous command logging is recommended for DR. Synchronous command logging guarantees that all transactions are recorded in the command log and no transactions are lost. If you use asynchronous command logging, there is a possibility that a binary log is applied but not captured by the command log before the cluster crashes. Then when the database recovers, the two clusters will not agree on the last acknowledged DR transaction, and DR will not be able to resume.
The decision whether to promote the replica or wait for the master to return (and hopefully recover all transactions from the command log) is not an easy one. Promoting the replica and using it to replace the original master may involve losing one or more transactions per partition. However, if the master cannot be recovered or cannot not be recovered quickly, waiting for the master to return can result in significant business loss or interruption.
Your own business requirements and the specific situation that caused the outage will determine which choice to make — whether to wait for the failed cluster to recover or to continue operations on the remaining cluster only. The important point is that database replication makes the choice possible and significantly eases the dangers of unforeseen events.