Onnx-genai¶

Runs ONNX model inference using Generative AI extensions for onnxruntime. See the official github repo for more information.

This processor can either use an existing model reference defined in resources section or create a new model instance using the provided URI.

It supports various hugging-face style configuration options for model generation parameters, chat formatting, and caching.

Models¶

Supported model architectures as of onnx-genai-0.8.3 release are: AMD OLMo, ChatGLM, DeepSeek, ERNIE 4.5, Gemma, Granite, Llama (and derivatives), Mistral (and derivatives), Nemotron, Phi (language + vision), Qwen

Downloads¶

The modelUri can point to a location on local disk (using file:// scheme) or on a remote storage. Remote storage support depends on available plugins, e.g., S3 plugin allows downloads from an S3-compatible bucket.

Files¶

The modelUri should point to a location containing the following files: — genai_config.json — model.onnx — model.onnx.data — tokenizer.json — tokenizer_config.json

JAVAYAML

.processWith(OnnxGenaiProcessorConfigBuilder.builder()
    .withModelRef(value)
    .withModelUri(value)
    .withPrintDownloadProgress(value)
    .withChatTemplate(value)
    .withStreamResponse(value)
    .withCache(builder -> builder
        .withDirectory(value)
        .withMaxCacheSize(value)
        .withExpirationTime(value)
        .withCleanupOnStart(value)
    )
    .withProperties(value)
)

processor:
  onnx-genai:
    modelRef: value
    modelUri: value
    printDownloadProgress: value
    chatTemplate: value
    streamResponse: value
    cache:
      directory: value
      maxCacheSize: value
      expirationTime: value
      cleanupOnStart: value
    properties: value

Java dependency management¶

Add this declaration to your dependency management system to access the configuration DSL for this plugin in Java.

MavenGradle

<dependency>
    <groupId>org.voltdb</groupId>
    <artifactId>volt-stream-plugin-onnx-api</artifactId>
    <version>1.5.4</version>
</dependency>

implementation group: 'org.voltdb', name: 'volt-stream-plugin-onnx-api', version: '1.5.4'

Properties¶

`modelRef`¶

Reference to an existing ONNX GenAI model resource. If specified, modelUri is ignored. Type: string

`modelUri`¶

URI to the directory containing the model files. Required if modelRef is not specified. Type: string

`printDownloadProgress`¶

Whether to display progress information during model file downloads. Type: boolean

Default value: false

`chatTemplate`¶

Template for formatting chat input. The {input} placeholder will be replaced with the actual input text. Type: string

Default value: <|user|>\n{input} <|end|>\n<|assistant|>

`streamResponse`¶

Whether to stream the model's response token by token as individual events (true) or return the complete response at once (false). Type: boolean

Default value: false

`cache`¶

This configuration controls how model files are cached locally, including the cache location, size limits, expiration policy, and cleanup behavior. If not provided files will be cached in the /tmp directory.

Type: object

Fields of cache:

`cache.directory`¶

Directory where files will be cached. If not specified, a temporary directory will be created. Type: string

`cache.maxCacheSize`¶

Maximum size of the cache in bytes. Files will be evicted when the cache exceeds this size. Use 0 for unlimited. Type: number

Default value: 0

`cache.expirationTime`¶

Duration after which cached files are considered stale and will not be used by the system. Type: object

`cache.cleanupOnStart`¶

Whether to clean up expired or invalid cache entries when the cache is initialized. Type: boolean

Default value: false

`properties`¶

Model-specific generation parameters as key-value pairs. Supported value types are numbers and booleans. Common parameters include max_length, temperature, top_p, etc. Type: object

Usage Examples¶

YAML

version: 1 name: ChatWithMe

resources: - name: "s3-models-storage" s3: credentials: accessKey: "..." secretKey: "..."

source: stdin: {}

pipeline: processors: - onnx-genai: modelUri: "s3-models-storage://com.acme.chatmodels/phi4-mini-istruct" chatTemplate: "<|user|>\n{input} <|end|>\n<|assistant|>" printDownloadProgress: true streamResponse: true properties: max_length: "2048" cache: directory: "/media/"

sink: stdout: {}

Onnx-genai¶

Models¶

Downloads¶

Files¶

Java dependency management¶

Properties¶

modelRef¶

modelUri¶

printDownloadProgress¶

chatTemplate¶

streamResponse¶

cache¶

cache.directory¶

cache.maxCacheSize¶

cache.expirationTime¶

cache.cleanupOnStart¶

properties¶

Usage Examples¶

`modelRef`¶

`modelUri`¶

`printDownloadProgress`¶

`chatTemplate`¶

`streamResponse`¶

`cache`¶

`cache.directory`¶

`cache.maxCacheSize`¶

`cache.expirationTime`¶

`cache.cleanupOnStart`¶

`properties`¶