Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4

Summary

RotorQuant + MLX-NVFP4 (4-bit) variant of Qwen/Qwen3.6-35B-A3B.

Why this variant

Apple Silicon (M1/M2/M3/M4) with RotorQuant structural pre-conditioning and MLX-native NVFP4 layout (E2M1 weights, per-16-element FP8 (NVIDIA Blackwell layout)). 4.503 bits/weight, ~18 GB on disk, sub-2-s load on M4 Max. Pick this over the affine MLX variants when you want NVFP4 format parity with hardware pipelines while running locally.

Hardware compatibility

Device VRAM Recommendation
Apple M4 Max 128 GB ~21 GB recommended — headroom for long context
Apple M3 Max 64 GB ~21 GB fits comfortably
Apple M2 Max 32 GB ~21 GB tight — short context only

Reproduce

# dequantize from the rotor/turbo MLX-8bit source, then re-quantize
python -c "from mlx_lm import convert; convert(hf_path=\"majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit\", mlx_path=\"bf16\", dequantize=True, trust_remote_code=True)"
python -c "from mlx_lm import convert; convert(hf_path=\"bf16\", mlx_path=\"out-nvfp4\", quantize=True, q_bits=4, q_group_size=16, q_mode=\"nvfp4\", trust_remote_code=True)"

Reproduced at commit 919836a.

Evaluation

benchmarks pending — populated after the eval-harness workstream lands.

Family

Provenance

  • Source SHA: majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit
  • Calibration hash: none (nvfp4 is calibration-free; rotor/turbo conditioning inherited from source)
  • Uploaded: 2026-04-21T06:17:30.021158+00:00

Toolchain:

  • huggingface_hub: 1.11.0
  • mlx: 0.31.1
  • mlx-lm: 0.31.2

License

Released under apache-2.0. Upstream license of the base model applies.

Downloads last month
822
Safetensors
Model size
35B params
Tensor type
U8
·
U32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4

Quantized
(364)
this model