Beta: VoltSP YAML Pipeline Definition Language¶
VoltSP provides a declarative YAML configuration language for defining streaming data pipelines without writing Java code. This document describes the structure and options available in the YAML configuration format.
Basic Structure¶
A VoltSP pipeline configuration requires the following main sections:
version: 1 # Required: Configuration version (must be 1)
name: "pipeline-name" # Required: Pipeline name
source: { } # Required: Source configuration
pipeline: { } # Optional: Processing steps to apply
sink: { } # Required: Sink configuration
logging: { } # Optional: Logging configuration
Configuration Sections¶
Version¶
Must be 1
. This field is required.
version: 1
Name¶
Pipeline name that will be visible in the logs as well as metrics. This field is required.
name: "my-pipeline"
Source Configuration¶
The source
section defines where the pipeline gets its data. You must specify exactly one source. All sources
available to the Java DSL are supported.
Each source type has its own configuration parameters.
Pipeline Configuration¶
The pipeline
section defines processing configuration and data transformations that should be applied. It includes:
parallelism
: Optional value specifying pipeline parallelismprocessors
: Optional array of processor configurations
Sink Configuration¶
The sink
section defines where the pipeline outputs its data. You must specify exactly one sink type. All sinks
available to the Java DSL are supported.
Logging Configuration¶
Note: Not yet implemented
The optional logging
section configures logging behavior:
logging:
globalLevel: "DEBUG" # Global log level
loggers: # Per-logger configuration
"org.myapp": "TRACE"
"org.thirdparty": "WARN"
Source Types¶
Some example of simple source configurations.
File Source¶
Reads data from a file:
source:
file:
path: "input.txt" # Required: Path to input file
Stdin Source¶
Reads data from standard input:
source:
stdin: { }
Collection Source¶
Reads from a static collection of strings:
source:
collection:
elements: # Required: Array of strings
- "element1"
- "element2"
Network Source¶
Reads from network:
source:
network:
type: "UDP" # Required: UDP or TCP
address: "0.0.0.0:12345" # Required: Port number or address:port
decoder: "lines" # Required: Decoder type (none/identity/line/bytes)
Beats Source¶
Reads from Elastic Beats:
source:
beats:
address: "0.0.0.0:514" # Required: Listen address
clientInactivityTimeout: "PT30S" # Optional: Connection idle timeout (ISO8601 duration)
Sink Types¶
VoltDB Sink¶
Outputs to VoltDB:
sink:
voltdb-procedure:
procedureName: "MyStoredProc" # Required: Stored procedure name
servers: "voltdb-host:21212" # Required: VoltDB host
client:
retires: 3 # Optional: Number of retries
Processor Types¶
Note: Not yet implemented
Processors can be written in multiple languages and are defined in the pipeline's processors
array. Each processor
must specify its language and code:
pipeline:
processors:
- javascript:
code: "message.toUpperCase()"
- python:
code: |
import re
def process(message):
return message.lower()
process(message)
- ruby:
code: |
message.reverse
Complete Examples¶
Simple File Processing Pipeline¶
version: 1
name: "file-processor"
source:
file:
path: "input.txt"
pipeline:
parallelism: 1
processors:
- javascript:
code: |
message.toUpperCase();
sink:
file:
dirPath: "/tmp"
Kafka to VoltDB Pipeline¶
version: 1
name: "kafka-to-voltdb"
source:
kafka:
bootstrapServers:
- "kafka1:9092"
- "kafka2:9092"
topicNames:
- "incoming-data"
groupId: "processor-group"
startingOffset: "LATEST"
pipeline:
parallelism: 4
processors:
- javascript:
code: |
// Transform message
JSON.parse(message)
sink:
voltdb-procedure:
procedureName: "ProcessData"
servers: "voltdb-host:21212"
client:
retires: 3
logging:
globalLevel: "INFO"
loggers:
org.voltdb: "DEBUG"
Network to Network Pipeline¶
version: 1
name: "network-relay"
source:
network:
type: "UDP"
address: "0.0.0.0:12345"
decoder: "lines"
pipeline:
parallelism: 1
sink:
network:
type: "UDP"
address: "target-host:54321"