publish RotorQuant MLX-NVFP4 variant

baacf65 verified about 1 month ago

2.6 kB

license: apache-2.0
base_model: Qwen/Qwen3.6-35B-A3B
pipeline_tag: image-text-to-text
library_name: mlx
tags:
  - qwen
  - qwen-3.6
  - moe
  - rotor
  - mlx
  - nvfp4
  - apple-silicon

Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4

Summary

RotorQuant + MLX-NVFP4 (4-bit) variant of Qwen/Qwen3.6-35B-A3B.

Why this variant

Apple Silicon (M1/M2/M3/M4) with RotorQuant structural pre-conditioning and MLX-native NVFP4 layout (E2M1 weights, per-16-element FP8 (NVIDIA Blackwell layout)). 4.503 bits/weight, ~18 GB on disk, sub-2-s load on M4 Max. Pick this over the affine MLX variants when you want NVFP4 format parity with hardware pipelines while running locally.

Hardware compatibility

Device	VRAM	Recommendation
Apple M4 Max 128 GB	~21 GB	recommended — headroom for long context
Apple M3 Max 64 GB	~21 GB	fits comfortably
Apple M2 Max 32 GB	~21 GB	tight — short context only

Reproduce

# dequantize from the rotor/turbo MLX-8bit source, then re-quantize
python -c "from mlx_lm import convert; convert(hf_path=\"majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit\", mlx_path=\"bf16\", dequantize=True, trust_remote_code=True)"
python -c "from mlx_lm import convert; convert(hf_path=\"bf16\", mlx_path=\"out-nvfp4\", quantize=True, q_bits=4, q_group_size=16, q_mode=\"nvfp4\", trust_remote_code=True)"

Reproduced at commit 919836a.

Evaluation

benchmarks pending — populated after the eval-harness workstream lands.

Family

bf16 — Qwen/Qwen3.6-35B-A3B
FP8 card — majentik/Qwen3.6-35B-A3B-FP8
RotorQuant MLX-4bit (affine) — majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-4bit
RotorQuant MLX-8bit (source for this) — majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit
plain MLX-NVFP4 (no rotor/turbo) — majentik/Qwen3.6-35B-A3B-MLX-NVFP4

Provenance

Source SHA: majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit
Calibration hash: none (nvfp4 is calibration-free; rotor/turbo conditioning inherited from source)
Uploaded: 2026-04-21T06:17:30.021158+00:00

Toolchain:

huggingface_hub: 1.11.0
mlx: 0.31.1
mlx-lm: 0.31.2

License

Released under apache-2.0. Upstream license of the base model applies.