Writing your own Active(SP) pipeline is simple. Each pipeline consists of a data source, one or more processors that operate on the data, and ends by sending the resulting record to a data target or sink. You describe the structure of your pipeline using a Domain Specific Language (DSL), written in Java. The DSL includes classes and methods that define the structure of your pipeline and can be compiled into the actual runtime code.
For Active(SP) the DSL describes the three primary components of the pipeline: the source, the processors, and the sink. Like so:
stream .consumeFromSource( [ . . . ] ) .processWith( [ . . . ] ) .processWith( [ . . . ] ) .processWith( [ . . . ] ) [ . . . ] .terminateWithSink( [ . . . ] )
The following sections describe:
How to start your pipeline project
Defining the source and destination for the pipeline
Defining the business operations on the data (the processors)
Building and running the pipeline
You could start your Active(SP) pipeline project from scratch, setting up the necessary folder structure, creating Java source files and defining the Maven dependencies and Helm properties by hand. But it is much easier to start with a template, and the quick start example described in Chapter 2, See Active(SP) in Action can be used for just that. Follow the instructions for downloading the quick start, specifying your organization's ID as the group ID and your pipeline name as the artifact ID to create your template. Let's say you are creating a pipeline called mydatapipe, the resulting template might have the following folder structure:
mydatapipe - src - main - java - org - acme - resources - test ...
The following are the key files you will use for creating your own pipeline from the quick start sample:
mydatapipe/pom.xml
— The Maven project file for building the pipeline
mydatapipe/src/main/java/{your-org}/*.java
— Pipeline definition files you can revise
and reuse to match your pipeline's source, sink, and processors.
mydatapipe/src/main/resources/*.yaml
— Helm property files you can use to describe the
data resources the pipeline requires, such as Kafka streams, Volt Active Data databases, and their properties.