7.3. Writing a Custom Formatter

Documentation

VoltDB Home » Documentation » Guide to Performance and Customization

7.3. Writing a Custom Formatter

A formatter is a module that takes a row of data received by an import connector, interprets the contents, and translates it into individual column values. The default formatter that is provided with VoltDB parses comma-separated values (CSV) data. However, if the data you are importing is in a different format, you can write a custom formatter to perform this translation step.

You provide a custom formatter as an OSGi (Open Service Gateway Initiative) bundle. However, much of the standard work of an OSGi bundle is handled by the VoltDB import framework. So you only need to provide selected components as described in the following sections.

Note

Custom formatters can be used with both custom and built-in import connectors and with the standalone kafkaloader utility.

The following sections describe:

  • The structure of the custom formatter

  • Compiling and packaging custom formatter bundles

  • Installing and invoking custom formatters

  • Using custom formatters with the kafkaloader utility

7.3.1. The Structure of the Custom Formatter

The custom formatter must contain at least two Java classes: one that implements the org.voltdb.importer.formatter.Formatter interface and one that extends the org.voltdb.importer.formatter.AbstractFormatterFactory interface.

For the sake of example, let's assume the custom formatter classes are called MyFormatter and MyFormatterFactory. When the associated import connector is initialized, the VoltDB importer infrastructure calls the classes' methods in the following order:

  • MyFormatterFactory.create() is called once, to initialize the formatter. The create method must return an instance of the MyFormatter class.

  • MyFormatter.MyFormatter() is invoked once when an instance of the MyFormatter class is initialized in the preceding step.

  • MyFormatter.transform() is called from the import connector every time it retrieves a record from the data source.

In many cases, the easiest way to create custom class is to modify an existing example. And VoltDB provides an example formatter that you can use as a base for your customizations in the VoltDB github at the following URL:

The next sections describe how to modify this example — or how to create a custom formatter from scratch, if you wish.

7.3.1.1. The AbstractFormatterFactory Interface and Class

You must create a class that extends the AbstractFormatterFactory class. However, within that class all you need to change is overriding the create() method to return an instance of your implementation of the Formatter interface. So, assuming the new class names use the prefix "MyFormatter" and using the example formatter provided in github, all you need to modify are the items highlighted in the following example:

package myformatter;

import org.voltdb.importer.formatter.AbstractFormatterFactory;

public class MyFormatterFactory extends AbstractFormatterFactory {
    /**
     * Creates and returns the formatter object.
     */
    @Override
    public MyFormatter create() {
        MyFormatter formatter = new MyFormatter(m_formatName, m_formatProps);
        return formatter;
    }
}

7.3.1.2. The Formatter Interface and Class

The bulk of the work of a custom formatter occurs in the class that implements the Formatter interface. Within that class, you must have at least one method that overrides the default transform() method. You can, optionally, include a method that initializes the class and handles any properties that need to be passed into the formatter from the import configuration.

7.3.1.2.1. Initializing the Formatter Class

The method that initializes the class has the same name as the class (in our example, MyFormatter). The method accepts two parameters: a string and a list of properties. The string contains the name of the formatter as specified in the database configuration file (see Section 7.3.3.2, “Configuring and Invoking Custom Formatters”). This string will, by definition, match the name of the class itself. The second parameter is a collection of Java Property objects representing properties set in the configuration file using the <format-property> element and certain VoltDB built-in properties, whose names all start with two underscores.

If the custom formatter doesn't require any information from the configuration, you do not need to include this method. However, if your formatter does require additional information, this class can retrieve and store information provided in the import configuration. For example, the MyFormatter() method in the following implementation looks for a "column_width" property and stores it for later use by the transform() method:

package myformatter;

import java.util.Properties;
import org.voltdb.importer.formatter.FormatException;
import org.voltdb.importer.formatter.Formatter;

public class MyFormatter implements Formatter {

    String column_width = "";

    MyFormatter (String formatName, Properties prop) {
        column_width = prop.getProperty("column_width");
    }
7.3.1.2.2. Transforming the Data

The method that does the actual work of formatting the incoming data is the transform() method. This method receives the incoming data as a Java byte buffer and is expected to return an array of Java objects representing the input parameters, which will be passed to the specified stored procedure to insert the data into the database.

For example, If the custom formatter expects data in fixed-width columns, the method might look like this:

@Override
public Object[] transform(ByteBuffer payload) throws FormatException {

   String buffer = new String(payload.array());
   ArrayList<Object> list = new ArrayList<Object>();

   int position = 0;
   while (position < buffer.length()) {
      int endpoint = Math.min(position+column_width, buffer.length());
      list.add(buffer.substring(position,endpoint));
      position += column_width;
   }
   return list.toArray();
}

7.3.2. Compiling and Packaging Custom Formatter Bundles

Once the custom formatter source code is complete, you are ready to compile and package the formatter as an OSGi bundle.

When compiling the source code, be sure to include the VoltDB JAR files in the Java classpath. For example, if VoltDB is installed in the folder /opt/voltdb, you will need to include /opt/voltdb/voltdb/* and /opt/voltbd/lib/* in the classpath.

You will also need to include a number of OSGi-specific attributes in the final JAR file manifest. For example, you must include the Bundle-Activator attribute pointing to the FormatterFactory class. To ensure all the necessary properties are set, it is easiest to use the ant utility and an ant build file. The following is an example build.xml file, with the items that you must modify highlighted in bold text:

<project default="build">
   <path id='project.classpath'>
      <!-- Replace this with the path to the VoltDB jars -->
      <fileset dir='/opt/voltdb'>
        <include name='voltdb/*.jar' />
        <include name='lib/*.jar' />
      </fileset>
  </path>

  <target name="build" depends="clean, dist, formatter"/>

  <target name="clean">
    <delete dir="obj"/>
    <delete file="myformatter.jar"/>
  </target>

  <target name="dist">
    <mkdir dir="obj"/>
    <javac srcdir="src" destdir="obj">
      <classpath refid="project.classpath"/>
    </javac>
  </target>

  <target name="formatter">
    <jar destfile="myformatter.jar" basedir="obj">
      <include name="myformatter/MyFormatter.class"/>
      <include name="myformatter/MyFormatterFactory.class"/>
      <manifest>
        <attribute name="Bundle-Activator" 
                   value="myformatter.MyFormatterFactory" />
        <attribute name="Bundle-ManifestVersion" value="2" />
        <attribute name="Bundle-Name" value="My Formatter OSGi Bundle" />
        <attribute name="Bundle-SymbolicName" value="MyFormatter" />
        <attribute name="Bundle-Version" value="1.0.0" />
        <attribute name="DynamicImport-Package" value="*" />
      </manifest>
    </jar>
  </target>
</project>

7.3.3. Installing and Invoking Custom Formatters

Once you have built and packaged the custom formatter, you are ready to install and use it in your VoltDB infrastructure.

7.3.3.1. Installing Custom Formatters

To install the custom formatter, you simply copy the formatter JAR file (in the preceding examples, myformatter.jar) to the bundles folder in the VoltDB installation on every server in the cluster. For example, if VoltDB is installed in /opt/voltdb:

$ cp obj/myformatter.jar /opt/voltdb/bundles/

7.3.3.2. Configuring and Invoking Custom Formatters

Once the JAR file is available to all VoltDB instances, you can configure and invoke the custom formatter as part of the import configuration. Note that the import configuration can be changed either before the database cluster is started or while the database is running using either the voltadmin update command of the web-based VoltDB Management Center.

You choose the formatter as part of the import configuration using the format attribute of the <configuration> element in the database configuration file. Normally, you use the built-in "csv" format. However, to select a custom formatter, set the format attribute to the name of the formatter JAR file and its class name. For example:

<import>
   <configuration type="kafka" format="myformatter.jar/MyFormatter" >
     [ . . . ]

Storing your custom JAR in the bundles directory is recommended. However, if you choose to keep your custom code elsewhere, you can still reference it in the configuration by including the absolute path to the file location as part of the format attribute. For example, if your JAR file is in the /etc/myapp folder, the format attribute value would be "file:/etc/myapp/myformatter.jar/MyFormatter". The formatter JAR must be in the same location on all nodes of the cluster.

Within the import configuration, you can also include any properties that the formatter needs using the <format-property> element. For example, in the preceding example, the custom formatter expects a property called "column_width", so the configuration might look like this:

<import>
   <configuration type="kafka" format="myformatter.jar/MyFormatter" >
     <property name="brokers">kafka.myorg.org:9092</property>
     <property name="topics">customer</property>
     <property name="procedure">CUSTOMER.insert</property>
     <format-property name="column_width">15</format-property>
  </configuration>
<import>

7.3.4. Using Custom Formatters With the kafkaloader Utility

You can also use custom formatters with the standalone kafkaloader utility. To use a custom formatter with kafkaloader you must:

  • Declare environment variables for FORMATTER_LIB and ZK_LIB

  • Create a formatter properties file specifying the formatter class and any formatter-specific properties the formatter requires.

The environment variables define the paths to the formatter JAR file and the Apache ZooKeeper libraries, respectively. (Note that ZooKeeper does not need to be running, but you must have a copy of the standard ZooKeeper libraries installed and accessible via the ZK_LIB environment variable.)

The formatter properties file must contain, at a minimum, a "formatter" property that is assigned to the formatter class of the custom formatter. It can contain other properties required by the formatter. The following is the properties file for kafkaloader that matches the example given in the previous section to configure the custom formatter using the built-in importer infrastructure:

formatter=MyFormatter
column_width=15

If both your formatter and the ZooKeeper libraries are in a folder myformatter under your home directory, along with the preceding properties file, you could start the kafkaloader utility with the following commands to use the custom formatter:

$ export FORMATTER_LIB="$HOME/myformatter/"
$ export ZKLIB="$HOME/myformatter/"
$ kafkaloader --formatter=$HOME/myformatter/formatter.config \
  --topic=customer --zookeeper=kafkahost:2181