โœจ Upgrade: Qwen 3.6 35B-A3B GGUF is the direct successor. Same 35B/3B MoE shape, new Gated-DeltaNet hybrid architecture. Measured ~1.75ร— faster on M4 Max (46.5 vs 26.6 t/s at IQ4) with stronger agentic coding (SWE-bench Verified 73.4 vs 70.0). โ†’ Model card

Qwen 3.5 35B-A3B GGUF โ€” Quantized by BatiAI

BatiFlow Ollama

IQ4_XS quantization of Qwen/Qwen3.5-35B-A3B for on-device AI on Mac. Built and verified by BatiAI for BatiFlow.

Quick Start

ollama pull batiai/qwen3.5-35b:iq4

Available Quantizations

Quant Size VRAM M4 Max (128GB) Recommended For
IQ4_XS 17GB 23GB 26.6 t/s 36GB+ Mac

Why MoE Beats Dense

35B-A3B is a Mixture-of-Experts model โ€” 35B total, only 3B active per token:

35B-A3B (MoE) 27B (Dense)
Total params 35B 27B
Active params 3B 27B
VRAM 23GB 28GB
Speed 26.6 t/s 17.0 t/s

MoE activates 9x fewer parameters โ€” same quality, much faster, less memory.

Benchmarks โ€” M4 Max (128GB)

Metric IQ4_XS
Token generation 26.6 t/s
Korean โœ…
Tool call JSON โœ…
VRAM 23 GB

Full BatiAI Qwen 3.5 Lineup

Model Size VRAM Speed Min Mac
batiai/qwen3.5-9b:q4 5.2GB ~8GB 12.5 t/s 16GB
batiai/qwen3.5-27b:iq4 14GB 28GB 17.0 t/s 32GB
batiai/qwen3.5-35b:iq4 17GB 23GB 26.6 t/s 36GB

Technical Details

  • Original Model: Qwen/Qwen3.5-35B-A3B
  • Architecture: MoE (35B total, 3B active, 256 experts, 8 routed + 1 shared)
  • Context Window: 262K tokens
  • License: Apache 2.0
  • Quantized with: llama.cpp (build 400ac8e)

About BatiFlow

BatiFlow โ€” free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.

License

Quantized from Qwen/Qwen3.5-35B-A3B. License: Apache 2.0.

Downloads last month
912
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for batiai/Qwen3.5-35B-A3B-GGUF

Quantized
(243)
this model