✨ Upgrade: Qwen 3.6 35B-A3B GGUF is the direct successor. Same 35B/3B MoE shape, new Gated-DeltaNet hybrid architecture. Measured ~1.75× faster on M4 Max (46.5 vs 26.6 t/s at IQ4) with stronger agentic coding (SWE-bench Verified 73.4 vs 70.0). → Model card

Qwen 3.5 35B-A3B GGUF — Quantized by BatiAI

IQ4_XS quantization of Qwen/Qwen3.5-35B-A3B for on-device AI on Mac. Built and verified by BatiAI for BatiFlow.

Quick Start

ollama pull batiai/qwen3.5-35b:iq4

Available Quantizations

Quant	Size	VRAM	M4 Max (128GB)	Recommended For
IQ4_XS	17GB	23GB	26.6 t/s	36GB+ Mac

Why MoE Beats Dense

35B-A3B is a Mixture-of-Experts model — 35B total, only 3B active per token:

	35B-A3B (MoE)	27B (Dense)
Total params	35B	27B
Active params	3B	27B
VRAM	23GB	28GB
Speed	26.6 t/s	17.0 t/s

MoE activates 9x fewer parameters — same quality, much faster, less memory.

Benchmarks — M4 Max (128GB)

Metric	IQ4_XS
Token generation	26.6 t/s
Korean	✅
Tool call JSON	✅
VRAM	23 GB

Full BatiAI Qwen 3.5 Lineup

Model	Size	VRAM	Speed	Min Mac
batiai/qwen3.5-9b:q4	5.2GB	~8GB	12.5 t/s	16GB
batiai/qwen3.5-27b:iq4	14GB	28GB	17.0 t/s	32GB
batiai/qwen3.5-35b:iq4	17GB	23GB	26.6 t/s	36GB

Technical Details

Original Model: Qwen/Qwen3.5-35B-A3B
Architecture: MoE (35B total, 3B active, 256 experts, 8 routed + 1 shared)
Context Window: 262K tokens
License: Apache 2.0
Quantized with: llama.cpp (build 400ac8e)

About BatiFlow

BatiFlow — free, on-device AI automation for Mac. 5MB app, 100% local, unlimited.

License

Quantized from Qwen/Qwen3.5-35B-A3B. License: Apache 2.0.

Downloads last month: 912

GGUF

Model size

35B params

Architecture

qwen35moe

Hardware compatibility

4-bit

Model tree for batiai/Qwen3.5-35B-A3B-GGUF

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Quantized

(243)

this model