โจ Upgrade: Qwen 3.6 35B-A3B GGUF is the direct successor. Same 35B/3B MoE shape, new Gated-DeltaNet hybrid architecture. Measured ~1.75ร faster on M4 Max (46.5 vs 26.6 t/s at IQ4) with stronger agentic coding (SWE-bench Verified 73.4 vs 70.0). โ Model card
Qwen 3.5 35B-A3B GGUF โ Quantized by BatiAI
IQ4_XS quantization of Qwen/Qwen3.5-35B-A3B for on-device AI on Mac. Built and verified by BatiAI for BatiFlow.
Quick Start
ollama pull batiai/qwen3.5-35b:iq4
Available Quantizations
| Quant | Size | VRAM | M4 Max (128GB) | Recommended For |
|---|---|---|---|---|
| IQ4_XS | 17GB | 23GB | 26.6 t/s | 36GB+ Mac |
Why MoE Beats Dense
35B-A3B is a Mixture-of-Experts model โ 35B total, only 3B active per token:
| 35B-A3B (MoE) | 27B (Dense) | |
|---|---|---|
| Total params | 35B | 27B |
| Active params | 3B | 27B |
| VRAM | 23GB | 28GB |
| Speed | 26.6 t/s | 17.0 t/s |
MoE activates 9x fewer parameters โ same quality, much faster, less memory.
Benchmarks โ M4 Max (128GB)
| Metric | IQ4_XS |
|---|---|
| Token generation | 26.6 t/s |
| Korean | โ |
| Tool call JSON | โ |
| VRAM | 23 GB |
Full BatiAI Qwen 3.5 Lineup
| Model | Size | VRAM | Speed | Min Mac |
|---|---|---|---|---|
| batiai/qwen3.5-9b:q4 | 5.2GB | ~8GB | 12.5 t/s | 16GB |
| batiai/qwen3.5-27b:iq4 | 14GB | 28GB | 17.0 t/s | 32GB |
| batiai/qwen3.5-35b:iq4 | 17GB | 23GB | 26.6 t/s | 36GB |
Technical Details
- Original Model: Qwen/Qwen3.5-35B-A3B
- Architecture: MoE (35B total, 3B active, 256 experts, 8 routed + 1 shared)
- Context Window: 262K tokens
- License: Apache 2.0
- Quantized with: llama.cpp (build 400ac8e)
About BatiFlow
BatiFlow โ free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.
License
Quantized from Qwen/Qwen3.5-35B-A3B. License: Apache 2.0.
- Downloads last month
- 912
Hardware compatibility
Log In to add your hardware
4-bit