Two important aspects of export to keep in mind are:
Export is automatic. When you enable an export configuration in the deployment file, the database servers take care of starting and stopping the connector on each server when the database starts and stops, including if nodes fail and rejoin the cluster. You can also start and stop export on a running database by updating the deployment file using the voltadmin update command.
Export is asynchronous. The actual delivery of the data to the export target is asynchronous to the transactions that initiate data transfer.
The advantage of an asynchronous approach is that any delays in delivering the exported data to the target system do not interfere with the VoltDB database performance. The disadvantage is that VoltDB must handle queueing export data pending its actual transmission to the target, including ensuring durability in case of system failures. Again, this task is handled automatically by the VoltDB server process. But it is useful to understand how the export queuing works and its consequences.
One consequence of this durability guarantee is that VoltDB will send at least one copy of every export record to the target. However, it is possible when recovering command logs or rejoining nodes, that certain export records are resent. It is up to the downstream target to handle these duplicate records. For example, using unique indexes or including a unique record ID in the export stream.
For the export process to work, it is important that the connector keep up with the queue of exported information. If too much data gets queued to the connector by the export function without being delivered by the target system, the VoltDB server process consumes increasingly large amounts of memory.
If the export target does not keep up with the connector and the data queue fills up, VoltDB starts writing overflow data in the export buffer to disk. This protects your database in several ways:
If the destination is intermittently unreachable or cannot keep up with the data flow, writing to disk helps VoltDB avoid consuming too much memory while waiting for the destination to catch up.
If the database is stopped, the export data is retained across sessions. When the database restarts, the connector will retrieve the overflow data and reinsert it in the export queue.
You can specify where VoltDB writes the overflow export data using the <exportoverflow> element in the deployment file. For example:
<paths> <voltdbroot path="/opt/voltdb/" /> <exportoverflow path="/tmp/export/"/> </paths>
If you do not specify a path for export overflow, VoltDB creates a subfolder in the root directory (in the preceding
/opt/voltdb). See Section 3.6.2, “Configuring Paths for Runtime Features” for more information about configuring
paths in the deployment file.
It is important to note that VoltDB only uses the disk storage for overflow data. However, you can force VoltDB to write all queued export data to disk by either calling the @Quiesce system procedure or by requesting a blocking snapshot. (That is, calling @SnapshotSave with the blocking flag set.) This means it is possible to perform an orderly shutdown of a VoltDB database and ensure all data (including export data) is saved with the following procedure:
Put the database into admin mode with the voltadmin pause command.
Perform a blocking snapshot with voltadmin save, saving both the database and any existing queued export data.
Shutdown the database with voltadmin shutdown.
You can then restore the database — and any pending export queue data — by starting the database in admin mode, restoring the snapshot, and then exiting admin mode.