15.5. How Export Works

Documentation

VoltDB Home » Documentation » Using VoltDB

15.5. How Export Works

Three important aspects of export to keep in mind are:

  • Export is automatic. When you enable an export target in the configuration file, the database servers take care of starting and stopping the connector on each server when the database starts and stops, including if nodes fail and rejoin the cluster. You can also start and stop export on a running database by updating the configuration file using the voltadmin update command.

  • Export is asynchronous. The actual delivery of the data to the export target is asynchronous to the transactions that initiate data transfer.

  • Stream data is queued for export as soon you declare a stream with the EXPORT TO TARGET clause and write to that stream. Even if the export target has not been configured yet. Similarly, when you drop the stream, its export queue is deleted, even if there is data waiting to be delivered to the configured export target.

The advantage of an asynchronous approach is that any delays in delivering the exported data to the target system do not interfere with the VoltDB database performance. The disadvantage is that VoltDB must handle queueing export data pending its actual transmission to the target, including ensuring durability in case of system failures. Again, this task is handled automatically by the VoltDB server process. But it is useful to understand how the export queuing works and its consequences.

One consequence of this durability guarantee is that VoltDB will send at least one copy of every export record to the target. However, it is possible when recovering command logs or rejoining nodes, that certain export records are resent. It is up to the downstream target to handle these duplicate records. For example, using unique indexes or including a unique record ID in the export stream.

Another consequence of the durability guarantee is that VoltDB will not continue exporting to the target if it finds a gap in the data for a specific stream. Normally, this is not a problem because in a K-safe cluster if a node fails, another node can take over responsibility for writing (and queuing) export data. However, in unusual cases where export falls behind and nodes fail and rejoin consecutively, especially if the failed nodes are replaced by new nodes and their overflow data is lost, it is possible for gaps to occur in the queues of streamed data. When this happens, VoltDB issues a warning to the console (and via SNMP) and waits for the missing data to be resolved. You can also use the @Statistics system procedure with the EXPORT selector to determine exactly what records are and are not present in the queues. If the gap cannot be resolved, you must use the voltadmin export release command to free the queue and resume export at the next available record.

15.5.1. Export Overflow

VoltDB uses persistent files on disk to queue export data waiting to be written to its specified target. If for any reason the export target can not keep up with the connector, VoltDB writes the excess data in the export buffer from memory to disk. This protects your database in several ways:

  • If the destination target is not configured, is unreachable, or cannot keep up with the data flow, writing to disk helps VoltDB avoid consuming too much memory while waiting for the destination to accept the data.

  • If the database stops, the export data is retained across sessions. When the database restarts, the connector will retrieve the overflow data and reinsert it in the export queue.

Even when the target does keep up with the flow, some amount of data is written to the overflow directory to ensure durability across database sessions. You can specify where VoltDB writes the overflow export data using the <exportoverflow> element in the configuration file. For example:

<paths>
   <exportoverflow path="/tmp/export/"/>
</paths>

If you do not specify a path for export overflow, VoltDB creates a subfolder in the database root directory. See Section 3.7.2, “Configuring Paths for Runtime Features” for more information about configuring paths in the configuration file.

15.5.2. Persistence Across Database Sessions

It is important to note that VoltDB only uses the disk storage for overflow data. However, you can force VoltDB to write all queued export data to disk using any of the following methods:

  • Calling the @Quiesce system procedure

  • Requesting a blocking snapshot (using voltadmin save --blocking)

  • Performing an orderly shutdown (using voltadmin shutdown)

This means that if you perform an orderly shutdown with the voltadmin shutdown command, you can recover the database — and any pending export queue data — by simply restarting the database cluster in the same root directories.

Note that when you initialize or re-initialize a root directory, any subdirectories of the root are purged.[5] So if your configuration did not specify a different location for the export overflow, and you re-initialize the root directories and then restore the database from a snapshot, the database is restored but the export overflow will be lost. If both your original and new configuration use the same, explicit directory outside the root directory for export overflow, you can start a new database and restore a snapshot without losing the overflow data.



[5] Initializing a root directory deletes any files in the command log and overflow directories. The snapshots directory is archived to a named subdirectory.