Qwen3.6-35B-A3B-RotorQuant-MLX-MXFP4

Summary

RotorQuant + MLX-MXFP4 (4-bit) variant of Qwen/Qwen3.6-35B-A3B.

Why this variant

Apple Silicon (M1/M2/M3/M4) with RotorQuant structural pre-conditioning and MLX-native MXFP4 layout (E2M1 weights, per-32-element E8M0 (OCP microscaling)). 4.253 bits/weight, ~17 GB on disk, sub-2-s load on M4 Max. Pick this over the affine MLX variants when you want MXFP4 format parity with hardware pipelines while running locally.

Hardware compatibility

Device VRAM Recommendation
Apple M4 Max 128 GB ~21 GB recommended — headroom for long context
Apple M3 Max 64 GB ~21 GB fits comfortably
Apple M2 Max 32 GB ~21 GB tight — short context only

Reproduce

# dequantize from the rotor/turbo MLX-8bit source, then re-quantize
python -c "from mlx_lm import convert; convert(hf_path=\"majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit\", mlx_path=\"bf16\", dequantize=True, trust_remote_code=True)"
python -c "from mlx_lm import convert; convert(hf_path=\"bf16\", mlx_path=\"out-mxfp4\", quantize=True, q_bits=4, q_group_size=32, q_mode=\"mxfp4\", trust_remote_code=True)"

Reproduced at commit 919836a.

Evaluation

benchmarks pending — populated after the eval-harness workstream lands.

Family

Provenance

  • Source SHA: majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit
  • Calibration hash: none (mxfp4 is calibration-free; rotor/turbo conditioning inherited from source)
  • Uploaded: 2026-04-21T06:14:03.928693+00:00

Toolchain:

  • huggingface_hub: 1.11.0
  • mlx: 0.31.1
  • mlx-lm: 0.31.2

License

Released under apache-2.0. Upstream license of the base model applies.

Downloads last month
552
Safetensors
Model size
35B params
Tensor type
U8
·
U32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-MXFP4

Quantized
(364)
this model