ONNX Export: Qwen/Qwen3-0.6B
- Original Model: Qwen/Qwen3-0.6B
- Architecture:
causal - Task:
text-generation - Optimization:
INT8 (ARM64) - Opset:
17
Usage
from tokenizers import Tokenizer
import onnxruntime as ort
import numpy as np
# 1. Load Tokenizer
tokenizer = Tokenizer.from_pretrained("broadfield-dev/Qwen3-0.6B-20260105-060935-onnx")
# 2. Load Model
session = ort.InferenceSession("model.onnx")
# 3. Inference
text = "Hello world"
encoding = tokenizer.encode(text)
inputs = {
"input_ids": np.array([encoding.ids], dtype=np.int64),
"attention_mask": np.array([encoding.attention_mask], dtype=np.int64)
}
outputs = session.run(None, inputs)
print(f"Output shape: {outputs[0].shape}")
Model Details
This model has been exported to ONNX format for efficient inference on edge devices and production environments. The export process preserves the original model's capabilities while optimizing for deployment.
- Downloads last month
- 5