ONNX Export: Qwen/Qwen3-0.6B

  • Original Model: Qwen/Qwen3-0.6B
  • Architecture: causal
  • Task: text-generation
  • Optimization: INT8 (ARM64)
  • Opset: 17

Usage


from tokenizers import Tokenizer
import onnxruntime as ort
import numpy as np

# 1. Load Tokenizer
tokenizer = Tokenizer.from_pretrained("broadfield-dev/Qwen3-0.6B-20260105-060935-onnx")

# 2. Load Model
session = ort.InferenceSession("model.onnx")

# 3. Inference
text = "Hello world"
encoding = tokenizer.encode(text)
inputs = {
    "input_ids": np.array([encoding.ids], dtype=np.int64),
    "attention_mask": np.array([encoding.attention_mask], dtype=np.int64)
}

outputs = session.run(None, inputs)
print(f"Output shape: {outputs[0].shape}")

Model Details

This model has been exported to ONNX format for efficient inference on edge devices and production environments. The export process preserves the original model's capabilities while optimizing for deployment.

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for broadfield-dev/Qwen3-0.6B-20260105-060935-onnx

Finetuned
Qwen/Qwen3-0.6B
Quantized
(288)
this model