Image-Text-to-Text
MLX
Safetensors
qwen3_5_moe
qwen
qwen-3.6
Mixture of Experts
rotor
nvfp4
apple-silicon
conversational
4-bit precision
Instructions to use majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4") config = load_config("majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4
Run Hermes
hermes
| license: apache-2.0 | |
| base_model: Qwen/Qwen3.6-35B-A3B | |
| pipeline_tag: image-text-to-text | |
| library_name: mlx | |
| tags: | |
| - qwen | |
| - qwen-3.6 | |
| - moe | |
| - rotor | |
| - mlx | |
| - nvfp4 | |
| - apple-silicon | |
| # Qwen3.6-35B-A3B-RotorQuant-MLX-NVFP4 | |
| ## Summary | |
| RotorQuant + MLX-NVFP4 (4-bit) variant of | |
| [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B). | |
| ## Why this variant | |
| Apple Silicon (M1/M2/M3/M4) with RotorQuant structural pre-conditioning and MLX-native NVFP4 layout (E2M1 weights, per-16-element FP8 (NVIDIA Blackwell layout)). 4.503 bits/weight, ~18 GB on disk, sub-2-s load on M4 Max. Pick this over the affine MLX variants when you want NVFP4 format parity with hardware pipelines while running locally. | |
| ## Hardware compatibility | |
| | Device | VRAM | Recommendation | | |
| | --- | --- | --- | | |
| | Apple M4 Max 128 GB | ~21 GB | recommended β headroom for long context | | |
| | Apple M3 Max 64 GB | ~21 GB | fits comfortably | | |
| | Apple M2 Max 32 GB | ~21 GB | tight β short context only | | |
| ## Reproduce | |
| ```bash | |
| # dequantize from the rotor/turbo MLX-8bit source, then re-quantize | |
| python -c "from mlx_lm import convert; convert(hf_path=\"majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit\", mlx_path=\"bf16\", dequantize=True, trust_remote_code=True)" | |
| python -c "from mlx_lm import convert; convert(hf_path=\"bf16\", mlx_path=\"out-nvfp4\", quantize=True, q_bits=4, q_group_size=16, q_mode=\"nvfp4\", trust_remote_code=True)" | |
| ``` | |
| Reproduced at commit `919836a`. | |
| ## Evaluation | |
| _benchmarks pending β populated after the eval-harness workstream lands._ | |
| ## Family | |
| - **bf16** β [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | |
| - **FP8 card** β [majentik/Qwen3.6-35B-A3B-FP8](https://huggingface.co/majentik/Qwen3.6-35B-A3B-FP8) | |
| - **RotorQuant MLX-4bit (affine)** β [majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-4bit](https://huggingface.co/majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-4bit) | |
| - **RotorQuant MLX-8bit (source for this)** β [majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit](https://huggingface.co/majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit) | |
| - **plain MLX-NVFP4 (no rotor/turbo)** β [majentik/Qwen3.6-35B-A3B-MLX-NVFP4](https://huggingface.co/majentik/Qwen3.6-35B-A3B-MLX-NVFP4) | |
| ## Provenance | |
| - Source SHA: `majentik/Qwen3.6-35B-A3B-RotorQuant-MLX-8bit` | |
| - Calibration hash: `none (nvfp4 is calibration-free; rotor/turbo conditioning inherited from source)` | |
| - Uploaded: `2026-04-21T06:17:30.021158+00:00` | |
| Toolchain: | |
| - `huggingface_hub`: 1.11.0 | |
| - `mlx`: 0.31.1 | |
| - `mlx-lm`: 0.31.2 | |
| ## License | |
| Released under `apache-2.0`. Upstream license of the base model applies. | |