Qwen3-Omni ARC ASR v4 โ€” MLX 4-bit

MLX 4-bit quantized version of amityrobotics/qwen3-omni-arc-asr-v4.

Fine-tuned from v3 with Korean LoRA (rank=32, alpha=64) merged into all attention layers (q/k/v/o_proj) across thinker, talker, code_predictor, code2wav, and audio_tower.

Quantization

  • Format: MLX safetensors
  • Bits: 4 (affine, group_size=64)
  • Size: ~20GB (5 shards)

Benchmark Results (478-case balanced test set)

Language v3 Baseline v4 (this model) Delta
Overall 90.6% 93.5% +2.9%
en-US 98.5% 95.5% -3.0%
ko-KR 83.3% 92.3% +9.0%
zh-CN 85.5% 92.0% +6.5%

Usage

from mlx_vlm import load
model, processor = load("amityrobotics/qwen3-omni-arc-asr-v4-mlx-4bit")
Downloads last month
59
Safetensors
Model size
7B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for amityrobotics/qwen3-omni-arc-asr-v4-mlx-4bit