15.6. The File Export Connector

Documentation

VoltDB Home » Documentation » Using VoltDB

15.6. The File Export Connector

The file connector receives the serialized data from the export streams and writes it out as text files (either comma or tab separated) to disk. The file connector writes the data out one file per stream, "rolling" over to new files periodically. The filenames of the exported data are constructed from:

  • A unique prefix (specified with the nonce property)

  • A unique value identifying the current version of the database schema

  • The stream name

  • A timestamp identifying when the file was started

  • Optionally, the ID of the host server writing the file

While the file is being written, the file name also contains the prefix "active-". Once the file is complete and a new file started, the "active-" prefix is removed. Therefore, any export files without the prefix are complete and can be copied, moved, deleted, or post-processed as desired.

There are two properties that must be set when using the file connector:

  • The type property lets you choose between comma-separated files (csv) or tab-delimited files (tsv).

  • The nonce property specifies a unique prefix to identify all files that the connector writes out for this database instance.

Table 15.1, “File Export Properties” describes the supported properties for the file connector.

Table 15.1. File Export Properties

PropertyAllowable ValuesDescription
type*csv, tsvSpecifies whether to create comma-separated (CSV) or tab-delimited (TSV) files,
nonce*stringA unique prefix for the output files.
outdirdirectory pathThe directory where the files are created. Relative paths are relative to the database root directory on each server. If you do not specify an output path, VoltDB writes the output files into the root directory itself.
periodIntegerThe frequency, in minutes, for "rolling" the output file. The default frequency is 60 minutes.
binaryencodinghex, base64Specifies whether VARBINARY data is encoded in hexadecimal or BASE64 format. The default is hexadecimal.
dateformatformat stringThe format of the date used when constructing the output file names. You specify the date format as a Java SimpleDateFormat string. The default format is "yyyyMMddHHmmss".
timezonestringThe time zone to use when formatting the timestamp. Specify the time zone as a Java timezone identifier. The default is GMT.
delimitersstring

Specifies the delimiter characters for CSV output. The text string specifies four characters in the following order: the separator, the quote character, the escape character, and the end-of-line character.

Non-printing characters must be encoded as Java literals. For example, the new line character (ASCII code 13) should be entered as "\n". Alternately, you can use Java Unicode literals, such as "\u000d". You must also encode any XML special characters, such as the ampersand and left angle bracket as HTML entities for inclusion in the XML configuration file. For example encoding "<" as "&gt;".

The following property definition matches the default delimiters. That is, the comma, the double quotation character twice (as both the quote and escape delimiters) and the new line character:

<property name="delimiter">,""\n</property>
batchedtrue, falseSpecifies whether to store the output files in subfolders that are "rolled" according to the frequency specified by the period property. The subfolders are named according to the nonce and the timestamp, with "active-" prefixed to the subfolder currently being written.
skipinternalstrue, falseSpecifies whether to include six columns of VoltDB metadata (such as transaction ID and timestamp) in the output. If you specify skipinternals as "true", the output files contain only the exported stream data.
uniquenamestrue, falseSpecifies whether to include the host ID in the file name to ensure that all files written are unique across a cluster. The export files are always unique per server. But if you plan to write all cluster files to a network drive or copy them to a single location, set this property to true to avoid any possible conflict in the file names. The default is false.
with-schematrue, falseSpecifies whether to write a JSON representation of each stream's schema as part of the export. The JSON schema files can be used to ensure the appropriate datatype and precision is maintained if and when the output files are imported into another system.

*Required


Whatever properties you choose, the order and representation of the content within the output files is the same. The export connector writes a separate line of data for every INSERT it receives, including the following information:

  • Six columns of metadata generated by the export connector. This information includes a transaction ID, a timestamp, a sequence number, the site and partition IDs, as well as an integer indicating the query type.

  • The remaining columns are the columns of the database stream, in the same order as they are listed in the database definition (DDL) file.