Skip to content

Onnx-genai

This onnx-genai resource loads and initializes an ONNX GenAI model that processors can reference. It handles downloading and caching of model files, as well as loading the necessary native libraries.

See the official ONNX GenAI github repo for more information.

Models

Supported model architectures as of onnx-genai-0.8.3 release are: AMD OLMo, ChatGLM, DeepSeek, ERNIE 4.5, Gemma, Granite, Llama (and derivatives), Mistral (and derivatives), Nemotron, Phi (language + vision), Qwen

Downloads

The modelUri can point to a location on local disk (using file:// scheme) or on a remote storage. Remote storage support depends on available plugins, e.g., S3 plugin allows downloads from an S3-compatible bucket.

Files

The modelUri should point to a location containing the following files: — genai_config.json — model.onnx — model.onnx.data — tokenizer.json — tokenizer_config.json

.configureResource(OnnxGenaiResourceConfigBuilder.builder()
    .withModelUri(value)
    .withPrintDownloadProgress(value)
    .withCache(builder -> builder
        .withDirectory(value)
        .withMaxCacheSize(value)
        .withExpirationTime(value)
        .withCleanupOnStart(value)
    )
)
resource:
  onnx-genai:
    modelUri: value
    printDownloadProgress: value
    cache:
      directory: value
      maxCacheSize: value
      expirationTime: value
      cleanupOnStart: value

Java dependency management

Add this declaration to your dependency management system to access the configuration DSL for this plugin in Java.

<dependency>
    <groupId>org.voltdb</groupId>
    <artifactId>volt-stream-plugin-onnx-api</artifactId>
    <version>1.0-20250910-124207-release-1.5.3</version>
</dependency>
implementation group: 'org.voltdb', name: 'volt-stream-plugin-onnx-api', version: '1.0-20250910-124207-release-1.5.3'

Properties

modelUri

URI to the directory containing the model files. Required if modelRef is not specified. Required.

Type: string

printDownloadProgress

Whether to display progress information during model file downloads. Type: boolean

Default value: false

cache

This configuration controls how model files are cached locally, including the cache location, size limits, expiration policy, and cleanup behavior. If not provided files will be cached in the /tmp directory.

Type: object

Fields of cache:

cache.directory

Directory where files will be cached. If not specified, a temporary directory will be created. Type: string

cache.maxCacheSize

Maximum size of the cache in bytes. Files will be evicted when the cache exceeds this size. Use 0 for unlimited. Type: number

Default value: 0

cache.expirationTime

Duration after which cached files are considered stale and will not be used by the system. Type: object

cache.cleanupOnStart

Whether to clean up expired or invalid cache entries when the cache is initialized. Type: boolean

Default value: false

Usage Examples

version: 1 name: ChatWithMe

resources: - name: "s3-models-storage" s3: credentials: accessKey: "..." secretKey: "..." - name: "MyLLM" onnx-genai: modelUri: "s3-models-storage://com.acme.chatmodels/phi4-mini-istruct" printDownloadProgress: true cache: directory: "/media/" source: stdin: {}

pipeline: processors: - onnx-genai: morelRef: "MyLLM" chatTemplate: "<|user|>\n{input} <|end|>\n<|assistant|>" properties: max_length: "2048"

sink: stdout: {}