Qwen3-Omni ARC ASR v4 โ MLX 4-bit
MLX 4-bit quantized version of amityrobotics/qwen3-omni-arc-asr-v4.
Fine-tuned from v3 with Korean LoRA (rank=32, alpha=64) merged into all attention layers (q/k/v/o_proj) across thinker, talker, code_predictor, code2wav, and audio_tower.
Quantization
- Format: MLX safetensors
- Bits: 4 (affine, group_size=64)
- Size: ~20GB (5 shards)
Benchmark Results (478-case balanced test set)
| Language | v3 Baseline | v4 (this model) | Delta |
|---|---|---|---|
| Overall | 90.6% | 93.5% | +2.9% |
| en-US | 98.5% | 95.5% | -3.0% |
| ko-KR | 83.3% | 92.3% | +9.0% |
| zh-CN | 85.5% | 92.0% | +6.5% |
Usage
from mlx_vlm import load
model, processor = load("amityrobotics/qwen3-omni-arc-asr-v4-mlx-4bit")
- Downloads last month
- 59
Model size
7B params
Tensor type
BF16
ยท
U32 ยท
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for amityrobotics/qwen3-omni-arc-asr-v4-mlx-4bit
Base model
amityrobotics/qwen3-omni-arc-asr-v3 Finetuned
amityrobotics/qwen3-omni-arc-asr-v4