Qwen3-Coder-Next-oQ4

oQ4 mixed-precision MLX quantization produced via oMLX.

  • Quantization: oQ4 (sensitivity-driven, group_size=64)
  • Format: MLX safetensors, loadable with mlx-vlm and mlx-lm

Usage

pip install mlx-vlm
python3 -m mlx_vlm generate --model bearzi/Qwen3-Coder-Next-oQ4 --prompt "Your prompt here" --max-tokens 512

About oQ

oQ measures per-layer quantization sensitivity through calibration inference and allocates bits where they matter most โ€” critical layers stay at higher precision, tolerant layers compress aggressively. See oMLX docs.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for bearzi/Qwen3-Coder-Next-oQ4

Finetuned
(27)
this model

Collection including bearzi/Qwen3-Coder-Next-oQ4