majentik's picture
publish RotorQuant MLX-NVFP4 variant
baacf65 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen3.6-35B-A3B
pipeline_tag: image-text-to-text
library_name: mlx
tags:
  - qwen
  - qwen-3.6
  - moe
  - rotor
  - mlx
  - nvfp4
  - apple-silicon

Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4

Summary

RotorQuant + MLX-NVFP4 (4-bit) variant of Qwen/Qwen3.6-35B-A3B.

Why this variant

Apple Silicon (M1/M2/M3/M4) with RotorQuant structural pre-conditioning and MLX-native NVFP4 layout (E2M1 weights, per-16-element FP8 (NVIDIA Blackwell layout)). 4.503 bits/weight, ~18 GB on disk, sub-2-s load on M4 Max. Pick this over the affine MLX variants when you want NVFP4 format parity with hardware pipelines while running locally.

Hardware compatibility

Device VRAM Recommendation
Apple M4 Max 128 GB ~21 GB recommended — headroom for long context
Apple M3 Max 64 GB ~21 GB fits comfortably
Apple M2 Max 32 GB ~21 GB tight — short context only

Reproduce

# dequantize from the rotor/turbo MLX-8bit source, then re-quantize
python -c "from mlx_lm import convert; convert(hf_path=\"majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit\", mlx_path=\"bf16\", dequantize=True, trust_remote_code=True)"
python -c "from mlx_lm import convert; convert(hf_path=\"bf16\", mlx_path=\"out-nvfp4\", quantize=True, q_bits=4, q_group_size=16, q_mode=\"nvfp4\", trust_remote_code=True)"

Reproduced at commit 919836a.

Evaluation

benchmarks pending — populated after the eval-harness workstream lands.

Family

Provenance

  • Source SHA: majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit
  • Calibration hash: none (nvfp4 is calibration-free; rotor/turbo conditioning inherited from source)
  • Uploaded: 2026-04-21T06:17:30.021158+00:00

Toolchain:

  • huggingface_hub: 1.11.0
  • mlx: 0.31.1
  • mlx-lm: 0.31.2

License

Released under apache-2.0. Upstream license of the base model applies.