Building first VoltSP project¶

The simplest way to start with VoltSP is to create a Volt(SP) quickstart Maven archetype, it will seed a sample project that can be further adjusted. For those not familiar with maven archetypes, they are maven projects for generating maven projects.

Example invocation:

mvn archetype:generate                                 \
    -DarchetypeGroupId=org.voltdb                      \
    -DarchetypeArtifactId=volt-stream-maven-quickstart \
    -DarchetypeVersion=1.7.0

This command will create a new project in the current directory. And yes, this command requires mvn and java to be available on your local PATH. Best practice is to add JAVA_HOME and MAVEN_HOME to the system PATH. The latest Java version works best.

The command will ask a couple of questions, like what should be a root package of the new project's files.

ℹ️ Note
I would assume that we use com.example.vsp and name "application"

Now cd into the project directory, and list the contents (with ls command). It should show

src/
  main/
    java/
      com/
        example/
          vsp/
    resources/
  test/
    java/
      com/
        example/
          vsp/
target/
pom.xml
README.md

Most IDEs will know how to import the project by importing the pom.xml file. Look at src/main/java/com/example/vsp it contains sample pipelines and src/test/java/com/example/vsp contains the test that verifies pipelines correctness.

You build the project with

mvn clean install

Your pipeline may need specific VoltSP plugins. The archetype already includes some dependencies. Look at pom.xml; you will find entries like

<dependencies>
  <dependency>
      <groupId>org.voltdb</groupId>
      <artifactId>volt-stream-plugin-volt-api</artifactId>
      <version>${volt.stream.version}</version>
      <scope>provided</scope>
  </dependency>
  <dependency>
      <groupId>org.voltdb</groupId>
      <artifactId>volt-stream-plugin-kafka-api</artifactId>
      <version>${volt.stream.version}</version>
      <scope>provided</scope>
  </dependency>
</dependencies>

The provided scope is important: those dependencies are already included within the VoltSP tarball and Docker image, but they are needed for Maven to correctly compile your code. For more information about provided plugins read the VoltSP Components Reference section.

The next chapter will walk you through how to run test for provided pipelines.

Locally testing a user-defined pipeline¶

Let's look at a sample test (you can find it after creating the sample project from the previous section):

class RandomToConsolePipelineTest {
    private static final Network network = Network.newNetwork();

    private VoltSpContainer simulation = null;

    @AfterEach
    void tearDown() {
        if (simulation != null) {
            simulation.shutdown();
        }
    }

    @Test
    void shouldRunAsBlackBox() {
        Path licensePath = Path.of("path/to/volt-license.xml");
        assertThat(licensePath.toFile())
                .describedAs("Volt license file not found")
                .exists();

        simulation = VoltSpContainer.newVoltSp()
                .withVoltLicense(licensePath)
                .withParallelism(1)
                .withAppJar(WorkingDirPaths.resolve("target/app.jar").hostPath())
                .withPipelineClass(RandomToConsolePipeline.class)
                .withConfigurationYaml("""
                                       tps: 100
                                       """)
                .withClassesUnder(MavenPaths.mavenClasses())
                .withLoggerName("simulation")
                // Docker can access other docker on the same network by its container name (see simulation.getContainerName()),
                // but you can specify the alias.
                .withNetwork("simulation", network)
                // when
                .awaitStart(Duration.ofSeconds(10));

        assertThat(simulation).isNotNull();
        Awaitility.await("for logs to arrive")
                .untilAsserted(() -> {
                    assertThat(simulation.getLogs()).contains("HELLO EARTH ");
                });
    }
}

This test demonstrates how to use the VoltSP Docker container in test isolation using VoltSpContainer. The container will be run locally on your machine, so this is not the best environment to test performance, but it is great to test correctness.

First of all verify the path to a valid Volt license by editing the line where the license is set:

Path licensePath = Path.of("path/to/volt-license.xml");

Let's walk through this example. The VoltSpContainer API requires you to provide a fully qualified Docker image like voltdb/volt-streams:1.7.0. The API allows you to pass only a tag or nothing; then voltdb/volt-streams:latest will be used.

Once we have specified which release we want to test against, we have to specify parallelism.

Parallelism¶

.withParallelism(1)

Parallelism tells how many workers will run inside the started container. The default is 1. Each worker runs in total isolation from other workers. Workers share nothing and instantiate separate sets of operators. This is crucial to understand as the more workers we define, the more resources we need to allocate on a local machine.

Pipeline definition¶

.withPipelineClass(RandomToConsolePipeline.class)

Look at the com.example.vsp.RandomToConsolePipeline class. In this example the pipeline will generate random text at a default rate of 10 TPS, then transform the text and print it to the console. Very basic. This simple pipeline could be expressed with a YAML definition; see YamlRandomToConsolePipelineTest.

Pipeline configuration¶

.withConfigurationYaml("""
                       tps: 100
                       """)

In order to avoid recompiling the class or changing the YAML definition, the user can provide named values within a user-defined structure. The above example can be read in the pipeline as

ExecutionContext.ConfigurationContext configurator = stream.getExecutionContext().configurator();
int tps = configurator.findByPath("tps").orElse(10);

This is a very simple example, so let's change it to a more complex user-defined structure. Values can be accessed using dot notation.

.withConfigurationYaml("""
                       generator:
                         tps: 100
                       """)
// and accessed like this
int tps = configurator.findByPath("generator.tps").orElse(10);

Some operators support a builder pattern and expect a specific configuration structure in order to auto-configure themselves. For example Kafka expects values to be organised as

sink:
  kafka:
    topicName: "my-topic"
    bootstrapServers: "kafka.example.com:9092"
    schemaRegistry: "http://registry.example.com"
    properties:
      key1: value1

Defining a classpath and resources¶

Package application JAR¶

User can specify any valid JAR as the application JAR. It must contain a pipeline class and all user classes the pipeline requires. For example, let's use a Maven-packaged JAR:

.withAppJar(WorkingDirPaths.resolve("target/application-1.7.0.jar").hostPath())

Maven source and test¶

VoltSpContainer supports adding classes and files from project directories.

.withClassesUnder(MavenPaths.mavenClasses())

In this example all classes and resources that would be packaged into a JAR — namely everything that is compiled and copied into target/classes — will be mounted at /volt-apps/ and available at container startup. The MavenPaths API allows filtering files to a specific folder, sourcing from the test directory. The same applies to resources.

.withClassesUnder(MavenPaths.mavenTestClasses("org.voltsp.testcontainer.app"))

Multiple calls to this method are allowed. During startup the content of the /volt-apps is printed to verify all classes and resources are currently mounted. The API also helps with files under the working directory which are not recognised by Maven.

WorkingDirPaths.resolve("data/files")

Such a path will point to project-root/data/files directory. The working directory is defined by System.getProperty("user.dir").

3rd party JAR¶

.with3rdPartyJar("org.apache.commons", "commons-lang3", "3.17.0")

You are allowed to copy into the container any number of 3rd party libraries. Those are also mounted under the /volt-apps/ directory of the container.

Networking¶

Users can define the network that the started container will use when connecting to other local containers. By default the host that launches the container can see all running containers and access them with localhost:EXPOSED_PORT, CONTAINER_NAME:EXPOSED_PORT or ALIAS:EXPOSED_PORT. The container will start with a random name, so the alias can be handy to always refer to a container with a given name.

.withNetwork("simulation", network)

Testcontainers API¶

VoltSpContainer is built around org.testcontainers.containers.GenericContainer. Full definition under GenericContainer javadoc: https://javadoc.io/doc/org.testcontainers/testcontainers/latest/org/testcontainers/containers/GenericContainer.html Anything that GenericContainer can do can be configured using VoltSpContainer.

Preparing for deployment¶

The command

mvn clean install

builds and creates a JAR under the /target directory. The test defined all dependencies for the pipeline to run: the 3rd party dependencies, and VoltSP plugin dependencies. Some of those are already included in the VoltSP package or Docker image, but 3rd party dependencies must be delivered with the application's JAR.

There are various strategies to deliver the application to a server

Copying project dependencies to a configured directory¶

The most straightforward option is to order Maven to list and copy all non-test and non-provided dependencies. Adding this configuration to pom.xml will copy dependencies to a target/lib directory.

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-dependency-plugin</artifactId>
            <version>3.6.1</version>

            <executions>
                <execution>
                    <id>copy-compile-dependencies</id>
                    <phase>prepare-package</phase>
                    <goals>
                        <goal>copy-dependencies</goal>
                    </goals>

                    <configuration>
                        <outputDirectory>${project.build.directory}/lib</outputDirectory>
                        <includeScope>runtime</includeScope>
                    </configuration>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-antrun-plugin</artifactId>
            <version>3.2.0</version>

            <executions>
                <execution>
                    <id>copy-main-jar</id>
                    <phase>package</phase>
                    <configuration>
                        <target>
                            <copy file="${project.build.directory}/${project.build.finalName}.jar"
                                  todir="${project.build.directory}/lib" />
                        </target>
                    </configuration>
                    <goals>
                        <goal>run</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

The same directory can be used to group other files needed to run the application and tar the whole directory before moving it to a server.

Shading 3rd party JARs¶

A more advanced option is repackaging the application JAR and adding 3rd party JARs inline.

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.6.0</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

This configuration will create a new JAR under the target directory. All other files still need to be delivered somehow to a server.

Creating a custom Docker image¶

Create an app.dockerfile next to pom.xml with content like

FROM registry.access.redhat.com/ubi9:9.6 AS deps
RUN dnf install -y procps-ng tar && dnf clean all

FROM voltdb/volt-streams:latest

RUN mkdir -p /volt-apps
COPY target/lib/* /volt-apps/

COPY --from=deps /usr/bin/tar /usr/bin/

This can be built with

docker buildx build \
  --push \
  --platform linux/amd64 \
  -t "my-org/sample-app:latest" \
  -f app.dockerfile \
  .