The best way to understand what Active(SP) does is to see it in action. The Active(SP) quick start implements two pipelines that demonstrate:
Streaming data to Kafka
Streaming data from Kafka to Volt Active Data
The source code for the sample is simple, easy to read, and useful both as a demonstration and as a template for building your own pipelines.
But let's get started. The steps for running the sample pipelines are:
To run the quick start pipelines you will need an environment to build the sample from Java source files into a Docker image, a cloud environment (such as Kubernetes) with access to a Docker repository, an Apache Kafka server and a Volt Active Data database cluster. The build process requires access to the Volt Active Data software repositories (see your Volt sales representative for more information) and the following software:
Java SDK version 17 or greater
Maven
Docker
The recommended runtime environment includes:
Docker
Kubernetes
Helm
Kafka
Volt Active Data V14.0 or later
A Volt Active Data license including Active(SP)
The quick start is available from the Volt Active Data repositories, including a Maven file for downloading and structuring the destination folders on your local system. First, set default to the directory where you want to install the sample source files, then issue the following shell command:
$ mvn archetype:generate \ -DarchetypeGroupId=org.voltdb \ -DarchetypeArtifactId=volt-stream-maven-quickstart \ -DarchetypeVersion=1.1.0
The maven script will first ask you for the group ID and artifact ID. These represent the package prefix (such as org.acme) and the name for the sample directory, respectively. In the following examples we will use org.acme as the package prefix and sample as the sample name. The script then asks a series of questions where you can take the default answer. For example:
$ mvn archetype:generate \ > -DarchetypeGroupId=org.voltdb \ > -DarchetypeArtifactId=volt-stream-maven-quickstart \ > -DarchetypeVersion=1.1.0 [ . . . ] Define value for property 'groupId': org.acme Define value for property 'artifactId': sample Define value for property 'version' 1.0-SNAPSHOT: : Define value for property 'package' org.acme: : Confirm properties configuration: kafka-bootstrap-servers: REPLACE-ME-IN-PIPELINE-YAML voltdb-servers: REPLACE-ME-IN-PIPELINE-YAML voltsp-api-version: 1.1.0 groupId: org.acme artifactId: sample version: 1.0-SNAPSHOT package: org.acme Y: : [ . . . ] [INFO] ----------------------------------------------- [INFO] BUILD SUCCESS [INFO] -----------------------------------------------
What the script does is create a subdirectory in the current folder named after the artifact ID. Within that directory tree are the Java source files for building the pipeline template and resources needed to run the pipelines. For example, if you chose sample as your artifact ID and org.acme as the group ID:
sample/
— contains a README and the Maven pom.xml file for building the sample
pipelines
sample/src/main/java/org/acme/
— contains the Java source files defining the
pipelines
sample/src/main/resources
— contains assets, including Helm YAML files and SQL schema,
needed to run the pipelines
Once you download the sample source files, you can build the pipeline templates using Maven. Set default to the
sample directory created in the previous step and issue the mvn clean package
command:
$ cd sample $ mvn clean package
Next you can load the completed pipeline templates into a Docker repository so they are available for use in your cloud environment. The easiest way to do this, since they are resources you will need to reference both now and when running the pipelines, is to define a few helpful environment variables for assets that are unique to you. These include the name of the Docker repository you will use and the Volt license file required to run the pipelines. For example:
$ export MY_DOCKER_REPO=johnqpublic/projects $ export MY_VOLT_LICENSE=$HOME/licenses/volt-license.xml
Now you can issue the docker commands to build an image and push it to your repository:
$ docker build \ --platform="linux/amd64" \ -t ${MY_DOCKER_REPO}:activesp-quickstart--latest \ -f src/main/resources/Dockerfile . $ docker push ${MY_DOCKER_REPO}:activesp-quickstart--latest
You are almost ready to run the sample pipelines. The last step before you can run the pipelines is to set up the
infrastructure they need as input and output. That is, identify an available Kafka server and/or a Volt Active Data
database, depending on which pipeline you run. For Kafka, if the server does not allow automatic creation of topics, you may
need to create the greetings topic manually. For VoltDB you will need a server that has the
necessary table defined. The easiest way to to that is initialize and start the database and apply the DDL in the
src/main/resources
folder:
$ voltdb init -f -D ~/db/sample $ voltdb start -D ~/db/sample & $ sqlcmd < src/main/resources/ddl.sql
Once you've identified the data source and destination, you can update the Helm properties files for the two pipelines
to match your selections. For example, if you have a Kafka broker running at kafka.acme.org and a
VoltDB database running on volt.acme.org, you can insert those addresses into the YAML files
kafka-to-volt-pipeline.yaml
and random-to-kafka-pipeline.yaml
in
src/main/resources
. For example, kafka-to-volt-pipeline.yaml
might look like this
(changes highlighted):
replicaCount: 1 resources: limits: cpu: 2 memory: 2G requests: cpu: 2 memory: 2G streaming: javaProperties: > -Dvoltsd.pipeline=org.acme.KafkaToVoltPipeline -Dvoltdb.server=volt.acme.org -Dkafka.consumer.group=1 -Dkafka.topic=greetings -Dkafka.bootstrap.servers=kafka.acme.org
Once you have set up the necessary infrastructure and edited the YAML files, you are ready to start the pipelines. You start the pipelines using Helm and specifying a name for the pipeline, the Active(SP) chart (voltdb/voltsp), your license, your Docker registry, your image name, and a pointer to the YAML properties file. If you have not defined environment variables for the Docker repository and license file yet, now is a good time to do that. For example:
$ export MY_DOCKER_REPO=johnqpublic/projects $ export MY_VOLT_LICENSE=$HOME/licenses/volt-license.xml $ helm install pipeline1 voltdb/volt-streams \ --set-file streaming.licenseXMLFile=${MY_VOLT_LICENSE} \ --set image.repository=${MY_DOCKER_REPO} \ --set image.tag=activesp-quickstart--latest \ --values src/main/resources/random-to-kafka-pipeline.yaml
The Helm command starts the Kubernetes pod and starts pushing random hello statements into the Kafka topic. You can then start the second pipeline, which pulls the statements from the topic and inserts them into the GREETINGS table in the database:
$ helm install pipeline2 voltdb/volt-streams \ --set-file streaming.licenseXMLFile=${MY_VOLT_LICENSE} \ --set image.repository=${MY_DOCKER_REPO} \ --set image.tag=activesp-quickstart--latest \ --values src/main/resources/kafka-to-volt-pipeline.yaml
Once the pipelines are running you can see the results by monitoring the greetings topic in Kafka or querying the GREETINGS table in VoltDB:
$ sqlcmd --servers=volt.acme.org > select count(*) from greetings;
You can also use Prometheus and Grafana to monitor your pipelines, including a custom Grafana dashboard. See Section 4.6, “Monitoring Your Pipeline” for more information.