See Active(SP) in Action¶

The best way to understand what VoltSP does is to see it in action. The Active(SP) quick start implements three pipelines that demonstrate:

Generating random data and printing it to the console
Streaming data to Kafka
Streaming data from Kafka to Volt Active Data

The source code for the quick start is simple, easy to read, and useful both as a demonstration and as a template for building your own pipelines.

But let's get started. The steps for running the sample pipelines are:

Make sure you have the necessary environment setup
Download the sample sources
Build the sample application
Run the pipelines in a Kubernetes environment

What You Will Need¶

To run the quick start pipelines you will need an environment to build the sample from Java source files into a jar file, a Kubernetes environment, an Apache Kafka cluster, and a Volt Active Data database cluster. The build process requires access to the Volt Active Data software repositories (see your Volt sales representative for more information) and the following software:

Java SDK version 17 or greater
Maven

The recommended runtime environment includes:

Kubernetes
Helm
Kafka
Volt Active Data V14.0 or later
A Volt Active Data license including Active(SP)

Downloading the Sample Application¶

The quick start is available through an archetype that Maven downloads automatically from a central repository. First, set default to the directory where you want to install the sample source files, then issue the following shell command:

$ mvn archetype:generate \
    -DarchetypeGroupId=org.voltdb \
    -DarchetypeArtifactId=volt-stream-maven-quickstart \
    -DarchetypeVersion=1.4.0

The script will ask you to input the group ID and artifact ID. These represent the package prefix (such as org.acme) and the name for the sample directory, respectively. In the following examples we will use org.acme as the package prefix and sample as the sample name. The script then asks a series of questions where you can take the default answer. For example:

$ mvn archetype:generate \
    -DarchetypeGroupId=org.voltdb \
    -DarchetypeArtifactId=volt-stream-maven-quickstart \
    -DarchetypeVersion=1.4.0
[ . . . ]
Define value for property 'groupId': org.acme
Define value for property 'artifactId': sample
Define value for property 'version' 1.0-SNAPSHOT
Define value for property 'package' org.acme
Confirm properties configuration:
kafka-bootstrap-servers: REPLACE-ME-IN-PIPELINE-YAML
voltdb-servers: REPLACE-ME-IN-PIPELINE-YAML
voltsp-api-version: ${project.version}
groupId: org.acme
artifactId: sample
version: 1.0-SNAPSHOT
package: org.acme
Y: :
[ . . . ]
[INFO] -----------------------------------------------
[INFO] BUILD SUCCESS
[INFO] -----------------------------------------------

What the script does is create a subdirectory in the current folder named after the artifact ID. Within that directory there are the Java source files for building the pipeline template and resources needed to run the pipelines. For example, if you chose sample as your artifact ID and org.acme as the group ID:

sample/ — contains a README and the Maven pom.xml file for building the sample pipelines
sample/src/main/java/org/acme/ — contains the Java source files defining the pipelines
sample/src/main/resources — contains assets, including Helm YAML files and SQL schema, needed to run the pipelines

Building the Sample Application¶

Once you download the sample source files, you can build the pipeline templates using Maven. Set default to the sample directory created in the previous step and issue the mvn clean package command:

$ cd sample
$ mvn clean package

Maven compiles all the source files, runs basic tests, then packages the whole application into a jar file inside target directory as target/sample-1.0-SNAPSHOT.jar.

Running the Sample Pipelines¶

You are almost ready to run the sample pipelines. The last step before you can run the pipelines is to set up the infrastructure they need as input and output. That is, identify an available Kafka bootstrap server and/or a Volt Active Data database, depending on which pipeline you run. For Kafka, having a bootstrap server up and available is usually sufficient. If it does not allow automatic creation of topics, you may need to create the greetings topic beforehand. For VoltDB you will need a server that has the necessary table defined. The easiest way to that is initialize and start the database and apply the DDL in the src/main/resources folder:

$ voltdb init -f -D ~/db/sample
$ voltdb start -D ~/db/sample &
$ sqlcmd < src/main/resources/ddl.sql

Once you identify the data source and destination, you can update the Helm properties files for the pipelines to match your selections. For example, if you have a Kafka broker running at kafka.my.corp.com and a VoltDB database running on volt.my.corp.com, you can insert those addresses into the YAML files kafka-to-volt-pipeline.yaml and random-to-kafka-pipeline.yaml in src/main/resources. For example, kafka-to-volt-pipeline.yaml might look like this:

replicaCount: 1

resources:
  limits:
    cpu: 2
    memory: 2G
  requests:
    cpu: 2
    memory: 2G

streaming:
    pipeline:
        className: org.acme.KafkaToVoltPipeline

        configuration:
            sink:
                voltdb:
                    cluster: "volt.my.corp.com"
            source:
                kafka:
                    topicNames: "greetings"
                    bootstrapServers: "kafka.my.corp.com"
                    groupId: "1"

Defining the configuration as YAML properties is the most flexible option. For alternative methods of defining the configuration properties, see the section on Helm Configuration Options.

Once you set up the necessary infrastructure and edit the YAML files, you are ready to start the pipelines. You start the pipelines using Helm and specifying the following items:

A name for the pipeline
The VoltSP chart (voltdb/volt-streams)
Your license
The application jar file
The YAML properties file

If you have not defined an environment variable for the license file yet, now is a good time to do that. For example:

$ export MY_VOLT_LICENSE=$HOME/licenses/volt-license.xml

$ helm install pipeline1 voltdb/volt-streams                       \
    --set-file streaming.licenseXMLFile=${MY_VOLT_LICENSE}         \
    --set-file streaming.voltapps=target/sample-1.0-SNAPSHOT.jar   \
    --values test/src/main/resources/random-to-kafka-pipeline.yaml

The Helm command starts the Kubernetes pod and starts pushing random hello statements into the Kafka topic. You can then start the second pipeline, which pulls the statements from the topic and inserts them into the GREETINGS table in the database:

$ helm install pipeline2 voltdb/volt-streams                       \
    --set-file streaming.licenseXMLFile=${MY_VOLT_LICENSE}         \
    --set-file streaming.voltapps=target/sample-1.0-SNAPSHOT.jar   \
    --values test/src/main/resources/kafka-to-volt-pipeline.yaml

Once the pipelines are running you can see the results by monitoring the greetings topic in Kafka or querying the GREETINGS table in VoltDB:

$ sqlcmd --servers=volt.my.corp.com
> select count(*) from greetings;

Or, if you want to use a Grafana dashboard to monitor the pipelines, just enable monitoring to have Prometheus start scraping metrics from the pipelines.

$ helm upgrade pipeline1 voltdb/volt-streams   \
    --reuse-values                           \
    --set monitoring.prometheus.enabled=true