Chapter 2. See Active(SP) in Action

Documentation

VoltDB Home » Documentation » Volt Active Data Active(SP)

Chapter 2. See Active(SP) in Action

The best way to understand what Active(SP) does is to see it in action. The Active(SP) quick start implements two pipelines that demonstrate:

  • Streaming data to Kafka

  • Streaming data from Kafka to Volt Active Data

The source code for the sample is simple, easy to read, and useful both as a demonstration and as a template for building your own pipelines.

But let's get started. The steps for running the sample pipelines are:

2.1. What You Will Need

To run the quick start pipelines you will need an environment to build the sample from Java source files into a Docker image, a cloud environment (such as Kubernetes) with access to a Docker repository, an Apache Kafka server and a Volt Active Data database cluster. The build process requires access to the Volt Active Data software repositories (see your Volt sales representative for more information) and the following software:

  • Java SDK version 17 or greater

  • Maven

  • Docker

The recommended runtime environment includes:

  • Docker

  • Kubernetes

  • Helm

  • Kafka

  • Volt Active Data V14.0 or later

  • A Volt Active Data license including Active(SP)

2.2. Downloading the Sample Application

The quick start is available from the Volt Active Data repositories, including a Maven file for downloading and structuring the destination folders on your local system. First, set default to the directory where you want to install the sample source files, then issue the following shell command:

$ mvn archetype:generate \
 -DarchetypeGroupId=org.voltdb \
 -DarchetypeArtifactId=volt-stream-maven-quickstart \
 -DarchetypeVersion=1.1.0

The maven script will first ask you for the group ID and artifact ID. These represent the package prefix (such as org.acme) and the name for the sample directory, respectively. In the following examples we will use org.acme as the package prefix and sample as the sample name. The script then asks a series of questions where you can take the default answer. For example:

$ mvn archetype:generate \
>  -DarchetypeGroupId=org.voltdb \
>  -DarchetypeArtifactId=volt-stream-maven-quickstart \
>  -DarchetypeVersion=1.1.0

     [ . . . ]

Define value for property 'groupId': org.acme
Define value for property 'artifactId': sample
Define value for property 'version' 1.0-SNAPSHOT: : 
Define value for property 'package' org.acme: : 
Confirm properties configuration:
kafka-bootstrap-servers: REPLACE-ME-IN-PIPELINE-YAML
voltdb-servers: REPLACE-ME-IN-PIPELINE-YAML
voltsp-api-version: 1.1.0
groupId: org.acme
artifactId: sample
version: 1.0-SNAPSHOT
package: org.acme
 Y: : 

     [ . . . ]

[INFO] -----------------------------------------------
[INFO] BUILD SUCCESS
[INFO] -----------------------------------------------

What the script does is create a subdirectory in the current folder named after the artifact ID. Within that directory tree are the Java source files for building the pipeline template and resources needed to run the pipelines. For example, if you chose sample as your artifact ID and org.acme as the group ID:

  • sample/ — contains a README and the Maven pom.xml file for building the sample pipelines

  • sample/src/main/java/org/acme/ — contains the Java source files defining the pipelines

  • sample/src/main/resources — contains assets, including Helm YAML files and SQL schema, needed to run the pipelines

2.3. Building the Sample Application

Once you download the sample source files, you can build the pipeline templates using Maven. Set default to the sample directory created in the previous step and issue the mvn clean package command:

$ cd sample
$ mvn clean package

Next you can load the completed pipeline templates into a Docker repository so they are available for use in your cloud environment. The easiest way to do this, since they are resources you will need to reference both now and when running the pipelines, is to define a few helpful environment variables for assets that are unique to you. These include the name of the Docker repository you will use and the Volt license file required to run the pipelines. For example:

$ export MY_DOCKER_REPO=johnqpublic/projects
$ export MY_VOLT_LICENSE=$HOME/licenses/volt-license.xml

Now you can issue the docker commands to build an image and push it to your repository:

$ docker build                                       \
      --platform="linux/amd64"                       \
      -t ${MY_DOCKER_REPO}:activesp-quickstart--latest \
      -f src/main/resources/Dockerfile .
$ docker push ${MY_DOCKER_REPO}:activesp-quickstart--latest

2.4. Running the Sample Pipelines

You are almost ready to run the sample pipelines. The last step before you can run the pipelines is to set up the infrastructure they need as input and output. That is, identify an available Kafka server and/or a Volt Active Data database, depending on which pipeline you run. For Kafka, if the server does not allow automatic creation of topics, you may need to create the greetings topic manually. For VoltDB you will need a server that has the necessary table defined. The easiest way to to that is initialize and start the database and apply the DDL in the src/main/resources folder:

$ voltdb init -f -D ~/db/sample
$ voltdb start -D ~/db/sample &
$ sqlcmd < src/main/resources/ddl.sql

Once you've identified the data source and destination, you can update the Helm properties files for the two pipelines to match your selections. For example, if you have a Kafka broker running at kafka.acme.org and a VoltDB database running on volt.acme.org, you can insert those addresses into the YAML files kafka-to-volt-pipeline.yaml and random-to-kafka-pipeline.yaml in src/main/resources. For example, kafka-to-volt-pipeline.yaml might look like this (changes highlighted):

replicaCount: 1

resources:
  limits:
    cpu: 2
    memory: 2G
  requests:
    cpu: 2
    memory: 2G

streaming:
  javaProperties: >
    -Dvoltsd.pipeline=org.acme.KafkaToVoltPipeline 
    -Dvoltdb.server=volt.acme.org
    -Dkafka.consumer.group=1  
    -Dkafka.topic=greetings 
    -Dkafka.bootstrap.servers=kafka.acme.org

Once you have set up the necessary infrastructure and edited the YAML files, you are ready to start the pipelines. You start the pipelines using Helm and specifying a name for the pipeline, the Active(SP) chart (voltdb/voltsp), your license, your Docker registry, your image name, and a pointer to the YAML properties file. If you have not defined environment variables for the Docker repository and license file yet, now is a good time to do that. For example:

$ export MY_DOCKER_REPO=johnqpublic/projects
$ export MY_VOLT_LICENSE=$HOME/licenses/volt-license.xml

$ helm install pipeline1 voltdb/volt-streams              \
  --set-file streaming.licenseXMLFile=${MY_VOLT_LICENSE}  \
  --set image.repository=${MY_DOCKER_REPO}                \
  --set image.tag=activesp-quickstart--latest             \
  --values src/main/resources/random-to-kafka-pipeline.yaml

The Helm command starts the Kubernetes pod and starts pushing random hello statements into the Kafka topic. You can then start the second pipeline, which pulls the statements from the topic and inserts them into the GREETINGS table in the database:

$ helm install pipeline2 voltdb/volt-streams              \
  --set-file streaming.licenseXMLFile=${MY_VOLT_LICENSE}  \
  --set image.repository=${MY_DOCKER_REPO}                \
  --set image.tag=activesp-quickstart--latest             \
  --values src/main/resources/kafka-to-volt-pipeline.yaml

Once the pipelines are running you can see the results by monitoring the greetings topic in Kafka or querying the GREETINGS table in VoltDB:

$ sqlcmd --servers=volt.acme.org
> select count(*) from greetings;

You can also use Prometheus and Grafana to monitor your pipelines, including a custom Grafana dashboard. See Section 4.6, “Monitoring Your Pipeline” for more information.