majentik's picture
publish RotorQuant MLX-NVFP4 variant
baacf65 verified
---
license: apache-2.0
base_model: Qwen/Qwen3.6-35B-A3B
pipeline_tag: image-text-to-text
library_name: mlx
tags:
- qwen
- qwen-3.6
- moe
- rotor
- mlx
- nvfp4
- apple-silicon
---
# Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4
## Summary
RotorQuant + MLX-NVFP4 (4-bit) variant of
[Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B).
## Why this variant
Apple Silicon (M1/M2/M3/M4) with RotorQuant structural pre-conditioning and MLX-native NVFP4 layout (E2M1 weights, per-16-element FP8 (NVIDIA Blackwell layout)). 4.503 bits/weight, ~18 GB on disk, sub-2-s load on M4 Max. Pick this over the affine MLX variants when you want NVFP4 format parity with hardware pipelines while running locally.
## Hardware compatibility
| Device | VRAM | Recommendation |
| --- | --- | --- |
| Apple M4 Max 128 GB | ~21 GB | recommended β€” headroom for long context |
| Apple M3 Max 64 GB | ~21 GB | fits comfortably |
| Apple M2 Max 32 GB | ~21 GB | tight β€” short context only |
## Reproduce
```bash
# dequantize from the rotor/turbo MLX-8bit source, then re-quantize
python -c "from mlx_lm import convert; convert(hf_path=\"majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit\", mlx_path=\"bf16\", dequantize=True, trust_remote_code=True)"
python -c "from mlx_lm import convert; convert(hf_path=\"bf16\", mlx_path=\"out-nvfp4\", quantize=True, q_bits=4, q_group_size=16, q_mode=\"nvfp4\", trust_remote_code=True)"
```
Reproduced at commit `919836a`.
## Evaluation
_benchmarks pending β€” populated after the eval-harness workstream lands._
## Family
- **bf16** β€” [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
- **FP8 card** β€” [majentik/Qwen3.6-35B-A3B-FP8](https://huggingface.co/majentik/Qwen3.6-35B-A3B-FP8)
- **RotorQuant MLX-4bit (affine)** β€” [majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-4bit](https://huggingface.co/majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-4bit)
- **RotorQuant MLX-8bit (source for this)** β€” [majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit](https://huggingface.co/majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit)
- **plain MLX-NVFP4 (no rotor/turbo)** β€” [majentik/Qwen3.6-35B-A3B-MLX-NVFP4](https://huggingface.co/majentik/Qwen3.6-35B-A3B-MLX-NVFP4)
## Provenance
- Source SHA: `majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit`
- Calibration hash: `none (nvfp4 is calibration-free; rotor/turbo conditioning inherited from source)`
- Uploaded: `2026-04-21T06:17:30.021158+00:00`
Toolchain:
- `huggingface_hub`: 1.11.0
- `mlx`: 0.31.1
- `mlx-lm`: 0.31.2
## License
Released under `apache-2.0`. Upstream license of the base model applies.