How VoltSP Stream Processing Works¶
Stream data processing has become a critical component of business operations. The exponential growth of available data and the pressure to act on information in real time have made traditional computing approaches obsolete. It is no longer sufficient to gather data and post process it to determine what actions to take. Now businesses need to operate on the data in flight to filter, format, validate, measure, and respond to events in a timely manner.
And for simple operations this works. Many operations, like filtering data based on fixed rules or converting from one format to another, can be performed at speed. However, the Achilles heel of stream data processing is the fact that many operations still require access to up-to-date and entrusted information such as customer accounts, inventory levels, and resource availability. These stateful operations, if performed against a traditional SQL database, incur the same latency to which previous centralized operations were susceptible. Which is where VoltSP comes in.
By integrating stream data processing with Volt Active Data — an ACID database designed to maximize throughput without sacrificing consistency, durability, or availability — VoltSP makes it possible to combine both stateless and stateful processing in flight and at speed.
Figure 3.1. VoltSP Architecture
VoltSP Architecture¶
The VoltSP architecture consists of three primary parts: sources, sinks, and processors. And the Domain Specific Language you use to define VoltSP pipelines mirror the exact same structure, letting you define the source, one or more processors, and a sink:
stream
source
processor
processor
processor
[ ... ]
sink
Where the processors can be any combination of stateless or stateful operations, with Volt Active Data providing real time access to reference data that can be used to verify, authenticate, authorize, or in other ways validate and enhance the data as it passes.
The advantages the VoltSP architecture offers are:
- Cloud Native — VoltSP pipelines are designed from the ground up to run in the cloud. It is also self contained and does not require any additional infrastructure (such as resource managers, schedulers, or the like). This allows for easy setup, scaling, and management.
- Apache Kafka and Volt Active Data integration — Kafka is supported out of the box as a data source and both Kafka and Volt Active Data are supported as sinks for the pipeline, so that setting up the initial pipeline template is trivial.
- Complex business logic — Partitioned procedures in Volt Active Data can be used to incorporate complex, stateful operations on the data without sacrificing latency.
- Flexibility — The pipelines are designed as templates, using placeholders for key resources such as server addresses and topic names, so that different pipelines can be created from the same template by identifying different resources in the properties at runtime.
- Scalability — The pipelines themselves can be scaled at runtime completely separately from the resources, such as Kafka servers or Volt Active Data cluster nodes allowing you to optimize computing resources to match actual needs.
Reliable data processing¶
VoltSP employs a sophisticated batch processing system that ensures reliable data handling even when interacting with remote systems. This system tracks requests and responses to external services, confirming that all operations within a processing batch have been successfully completed before considering the batch finished.
The batch processing mechanism works in conjunction with the circuit breaker pattern to provide:
- Reliable tracking of asynchronous operations
- Automatic retry capabilities for failed batches
- Clear separation between processing phases
- Proper handling of out-of-order responses
This approach ensures that data is processed reliably and consistently, even in the face of temporary system failures or network issues. By managing batches of operations as atomic units, VoltSP maintains data integrity throughout the processing pipeline.
Circuit Breaker¶
VoltSP incorporates circuit breaker patterns to enhance system resilience when interacting with remote systems. A circuit breaker is a mechanism that monitors the health of connections to external systems and temporarily halts processing when those systems experience problems.
When VoltSP detects that a remote system is experiencing issues (through consecutive failures), the circuit breaker "opens" to prevent further requests being sent to the troubled system. This approach:
- Prevents overwhelming already struggling systems with additional requests
- Allows remote systems time to recover without constant pressure
- Conserves resources that would otherwise be wasted on failed requests
- Enables graceful degradation of service rather than complete failure
VoltSP implements both local circuit breakers for individual components and a global circuit breaker that can temporarily pause the entire event processing pipeline when necessary. The system automatically attempts to resume normal operations once the remote systems return to a healthy state.