There are times when it is useful to create a copy of a database. Not just a snapshot of a moment in time, but a live, constantly updated copy.
K-safety maintains redundant copies of partitions within a single VoltDB database, which helps protect the database cluster against individual node failure. Database replication also creates a copy. However, database replication creates and maintains a copy of specific tables in a separate, often remote, database.
Database replication can be used for:
Offloading read-only workloads, such as reporting
Maintaining a "hot standby" in case of failure
Protecting against catastrophic events, often called disaster recovery
The next section, Section 11.1, “How Database Replication Works”, explains the principles behind database replication in VoltDB. The following sections provide step-by-step instructions for establishing and managing database replication using the functions and features of VoltDB, including:
Database replication involves duplicating the contents of tables in one database cluster (known as the master) to matching tables in another database cluster (known as the replica). The contents of the tables in the replica cluster are controlled by the master, which is why this arrangement is sometimes referred to as a master/slave relationship.
The replica database can be in the rack next to the master, in the next room, the next building, or another city entirely. The location depends upon your goals for replication. For example, if you are using replication for disaster recovery, geographic separation of the master and replica is required. If you are using replication for hot standby or offloading read-only queries, the physical location may not be important.
The process of retrieving completed transactions from the master and applying them to partitions on the replica is initiated by the replica. When the replica initiates replication, it performs the following tasks:
Connects to the master cluster, telling the master database to queue and PUSH a binary log of completed transactions to the replica.
Receives and ACKs the binary log once the transaction results are applied on the replica.
The actual replication process is performed in multiple parallel streams; each unique partition on the master cluster sends a binary log of completed transactions to the matching partition on the replica cluster. Replicating by partition has two key advantages:
The process is faster — Because the replication process uses a binary log of the results of the transaction (rather than the transaction itself), the replica does not need to reprocess the transaction; it can simply apply the results. Also, since each partition replicates autonomously, multiple streams of data are processed in parallel, significantly increasing throughout.
The process is more durable — In a K-safe environment, if a server fails on either the master or the replica, individual partition streams can be redirected to other nodes or a stream can wait for the server to rejoin — without interfering with the replication of the other partitions.
If data already exists in the master cluster when database replication is initiated, the master database first sends a snapshot of the existing data to the replica, as shown in Figure 11.2, “Replicating an Existing Database”. Once the snapshot is received and applied (and the two clusters are in sync), the partitions on the master database start sending binary logs of transaction results to keep the clusters synchronized.
If unforeseen events occur that make the master database unreachable, database replication lets you replace the master with the replica and restore normal business operations with as little downtime as possible. You switch the replica from read-only to a fully functional database by promoting it. To do this, perform the following steps:
Make sure the master is actually unreachable, because you do not want two live copies of the same database. If it is reachable but not functioning properly, be sure to pause or shut down the master database.
Promote the replica to a read/write mode using the voltadmin promote command.
Redirect the client applications to the newly promoted database.
Figure 11.3, “Promoting the Replica” illustrates how database replication reduces the risk of major disasters by allowing the replica to replace the master if the master becomes unavailable.
Once the master is offline and the replica is promoted, the data is no longer being replicated. As soon as normal business operations have been re-established, it is a good idea to also re-establish replication. This can be done using any of the following options:
If the original master database hardware can be restarted, take a snapshot of the current database (that is, the original replica), restore the snapshot on the original master and redirect client traffic back to the original. Replication can then be restarted using the original configuration.
An alternative, if the original database hardware can be restarted but you do not want to (or need to) redirect the clients away from the current database, is to use the original master hardware to create a replica of the newly promoted cluster — essentially switching the roles of the master and replica databases — as described in Section 11.3.3, “Reversing the Master/Replica Roles”.
If the original master hardware cannot be recovered effectively, create a new database cluster in a third location to use as a replica of the current database.
It is important to note that, unlike K-safety where multiple copies of each partition are updated simultaneously, database replication involves shipping the results of completed transactions from the master database to the replica. Because replication happens after the fact, there is no guarantee that the contents of the master and replica cluster are identical at any given point in time. Instead, the replica database "catches up" with the master after the binary logs are received and applied by each partition.
Also, because DR occurs on a per partition basis, changes to partitions may not occur in the same order on the replica, since one partition may replicate faster than another. Normally this is not a problem because the results of all single-partitioned transactions are atomic in the binary log. Also, any changes to replicated tables are handled as atomic in the binary logs, to ensure all copies of the table on the replica remain consistent. However, changes to partitioned tables from within a multi-partitioned transaction will result in separate logs that can arrive at the replica's partitions at different times.
If the master cluster crashes, there is no guarantee that the replica has managed to retrieve all the logs that were queued on the master. Therefore, it is possible that some transactions that completed on the master are not reflected on the replica. More importantly, if any multi-partitioned transactions update partitioned tables, you should be aware of the possibility that all of the results of that transaction did not arrive simultaneously. You may need to check the contents of the replica to see if any such transactions were interrupted in flight.
The decision whether to promote the replica or wait for the master to return (and hopefully recover all transactions from the command log) is not an easy one. Promoting the replica and using it to replace the original master may involve losing one or more transactions per partition. However, if the master cannot be recovered or cannot not be recovered quickly, waiting for the master to return can result in significant business loss or interruption.
Your own business requirements and the specific situation that caused the outage will determine which choice to make. However, database replication makes the choice possible and significantly eases the dangers of unforeseen events.
While database replication is occurring, the only changes to the replica database come from the binary logs. Client applications can connect to the replica and use it for read-only transactions, including read-only ad hoc queries and system procedures. However, any attempt to perform a write transaction from a client application returns an error.
There will always be some delay between a transaction completing on the master and its results being applied on the replica. However, for read operations that do not require real-time accuracy (such as reporting), the replica can provide a useful source for offloading certain less-frequent, read-only transactions from the master.