Qwen3-Omni ARC ASR v4 — MLX 4-bit

MLX 4-bit quantized version of amityrobotics/qwen3-omni-arc-asr-v4.

Fine-tuned from v3 with Korean LoRA (rank=32, alpha=64) merged into all attention layers (q/k/v/o_proj) across thinker, talker, code_predictor, code2wav, and audio_tower.

Quantization

Format: MLX safetensors
Bits: 4 (affine, group_size=64)
Size: ~20GB (5 shards)

Benchmark Results (478-case balanced test set)

Language	v3 Baseline	v4 (this model)	Delta
Overall	90.6%	93.5%	+2.9%
en-US	98.5%	95.5%	-3.0%
ko-KR	83.3%	92.3%	+9.0%
zh-CN	85.5%	92.0%	+6.5%

Usage

from mlx_vlm import load
model, processor = load("amityrobotics/qwen3-omni-arc-asr-v4-mlx-4bit")

Downloads last month: 59

Safetensors

Model size

7B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amityrobotics/qwen3-omni-arc-asr-v4-mlx-4bit

Base model

amityrobotics/qwen3-omni-arc-asr-v3

Finetuned

amityrobotics/qwen3-omni-arc-asr-v4

Quantized

(1)

this model