majentik
/

Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4

Image-Text-to-Text

Mixture of Experts

4-bit precision

Model card Files Files and versions

Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4 / README.md

majentik's picture

publish RotorQuant MLX-NVFP4 variant

baacf65 verified about 1 month ago

|

history blame contribute delete

2.6 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3.6-35B-A3B
	pipeline_tag: image-text-to-text
	library_name: mlx
	tags:
	- qwen
	- qwen-3.6
	- moe
	- rotor
	- mlx
	- nvfp4
	- apple-silicon
	---

	# Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4

	## Summary

	RotorQuant + MLX-NVFP4 (4-bit) variant of
	[Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B).

	## Why this variant

	Apple Silicon (M1/M2/M3/M4) with RotorQuant structural pre-conditioning and MLX-native NVFP4 layout (E2M1 weights, per-16-element FP8 (NVIDIA Blackwell layout)). 4.503 bits/weight, ~18 GB on disk, sub-2-s load on M4 Max. Pick this over the affine MLX variants when you want NVFP4 format parity with hardware pipelines while running locally.

	## Hardware compatibility

	\| Device \| VRAM \| Recommendation \|
	\| --- \| --- \| --- \|
	\| Apple M4 Max 128 GB \| ~21 GB \| recommended — headroom for long context \|
	\| Apple M3 Max 64 GB \| ~21 GB \| fits comfortably \|
	\| Apple M2 Max 32 GB \| ~21 GB \| tight — short context only \|

	## Reproduce

	```bash
	# dequantize from the rotor/turbo MLX-8bit source, then re-quantize
	python -c "from mlx_lm import convert; convert(hf_path=\"majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit\", mlx_path=\"bf16\", dequantize=True, trust_remote_code=True)"
	python -c "from mlx_lm import convert; convert(hf_path=\"bf16\", mlx_path=\"out-nvfp4\", quantize=True, q_bits=4, q_group_size=16, q_mode=\"nvfp4\", trust_remote_code=True)"
	```

	Reproduced at commit `919836a`.

	## Evaluation

	_benchmarks pending — populated after the eval-harness workstream lands._

	## Family

	- bf16 — [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
	- FP8 card — [majentik/Qwen3.6-35B-A3B-FP8](https://huggingface.co/majentik/Qwen3.6-35B-A3B-FP8)
	- RotorQuant MLX-4bit (affine) — [majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-4bit](https://huggingface.co/majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-4bit)
	- RotorQuant MLX-8bit (source for this) — [majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit](https://huggingface.co/majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit)
	- plain MLX-NVFP4 (no rotor/turbo) — [majentik/Qwen3.6-35B-A3B-MLX-NVFP4](https://huggingface.co/majentik/Qwen3.6-35B-A3B-MLX-NVFP4)

	## Provenance

	- Source SHA: `majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit`
	- Calibration hash: `none (nvfp4 is calibration-free; rotor/turbo conditioning inherited from source)`
	- Uploaded: `2026-04-21T06:17:30.021158+00:00`

	Toolchain:
	- `huggingface_hub`: 1.11.0
	- `mlx`: 0.31.1
	- `mlx-lm`: 0.31.2

	## License

	Released under `apache-2.0`. Upstream license of the base model applies.