Instructions to use majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4") config = load_config("majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4
Run Hermes
hermes
Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4
Summary
RotorQuant + MLX-NVFP4 (4-bit) variant of Qwen/Qwen3.6-35B-A3B.
Why this variant
Apple Silicon (M1/M2/M3/M4) with RotorQuant structural pre-conditioning and MLX-native NVFP4 layout (E2M1 weights, per-16-element FP8 (NVIDIA Blackwell layout)). 4.503 bits/weight, ~18 GB on disk, sub-2-s load on M4 Max. Pick this over the affine MLX variants when you want NVFP4 format parity with hardware pipelines while running locally.
Hardware compatibility
| Device | VRAM | Recommendation |
|---|---|---|
| Apple M4 Max 128 GB | ~21 GB | recommended — headroom for long context |
| Apple M3 Max 64 GB | ~21 GB | fits comfortably |
| Apple M2 Max 32 GB | ~21 GB | tight — short context only |
Reproduce
# dequantize from the rotor/turbo MLX-8bit source, then re-quantize
python -c "from mlx_lm import convert; convert(hf_path=\"majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit\", mlx_path=\"bf16\", dequantize=True, trust_remote_code=True)"
python -c "from mlx_lm import convert; convert(hf_path=\"bf16\", mlx_path=\"out-nvfp4\", quantize=True, q_bits=4, q_group_size=16, q_mode=\"nvfp4\", trust_remote_code=True)"
Reproduced at commit 919836a.
Evaluation
benchmarks pending — populated after the eval-harness workstream lands.
Family
- bf16 — Qwen/Qwen3.6-35B-A3B
- FP8 card — majentik/Qwen3.6-35B-A3B-FP8
- RotorQuant MLX-4bit (affine) — majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-4bit
- RotorQuant MLX-8bit (source for this) — majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit
- plain MLX-NVFP4 (no rotor/turbo) — majentik/Qwen3.6-35B-A3B-MLX-NVFP4
Provenance
- Source SHA:
majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit - Calibration hash:
none (nvfp4 is calibration-free; rotor/turbo conditioning inherited from source) - Uploaded:
2026-04-21T06:17:30.021158+00:00
Toolchain:
huggingface_hub: 1.11.0mlx: 0.31.1mlx-lm: 0.31.2
License
Released under apache-2.0. Upstream license of the base model applies.
- Downloads last month
- 822
4-bit
Model tree for majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4
Base model
Qwen/Qwen3.6-35B-A3B