✨ Also available: Qwen 3.6 35B-A3B GGUF — newer-generation MoE (only 3B active/token) with big agentic-coding gains. IQ3_XXS (13 GB) fits in 16 GB Mac mini; IQ4_XS on 24 GB+. → Model card

Qwen 3.5 9B GGUF — Quantized by BatiAI

Optimized GGUF quantizations of Qwen/Qwen3.5-9B for on-device AI on Mac. Quantized directly from official Alibaba weights by BatiAI for BatiFlow — free, unlimited, on-device AI automation for Mac.

Quick Start

# 16GB Mac — Best balance (recommended)
ollama pull batiai/qwen3.5-9b:q4

# 16GB Mac — Higher quality (slower on 16GB)
ollama pull batiai/qwen3.5-9b:q6

Available Quantizations

Quant	Size	16GB Mac mini M4	MacBook Pro M4 Max (128GB)	Recommended For
Q4_K_M	5.2GB	12.5 t/s ✅	43.2 t/s	16GB Mac (recommended)
Q6_K	6.9GB	4.2 t/s ⚠️ slower	40.8 t/s	16GB Mac (higher quality, slower)

Benchmarks — Real Hardware

Mac mini M4 (16GB) — Primary target

Metric	Q4_K_M	Q6_K
Token generation	12.5 t/s	4.2 t/s
Prompt eval	21.65 t/s	1.06 t/s
Load time	0.1s	7.5s
Korean output	✅ Excellent	✅ Good
Usable?	✅ Fast enough	⚠️ Usable but slow

MacBook Pro M4 Max (128GB)

Metric	Q4_K_M	Q6_K
Token generation	43.2 t/s	40.8 t/s
Korean output	✅	✅

vs Gemma 4 26B on 16GB Mac

Model	Speed on 16GB Mac	Verdict
batiai/gemma4-26b:q3 (12GB)	0.30 t/s	❌ Unusable
batiai/qwen3.5-9b:q4 (5.2GB)	12.5 t/s	✅ 40x faster

For 16GB Mac users, Qwen 3.5 9B Q4 is the clear winner — fast, smart, and fits comfortably in RAM.

Why Qwen 3.5 9B?

Benchmark champion: Outperforms GPT-OSS-120B on MMLU-Pro despite being 13x smaller
Best tool calling: Top-tier function calling accuracy among open models
Multilingual: 100+ languages including excellent Korean
Apache 2.0: Fully open, no restrictions
5.2GB Q4: Leaves 10GB free RAM on 16GB Mac — no swap, no lag

What About IQ3_M?

We tested IQ3_M (imatrix, 4.1GB) quantization. On 16GB Mac mini, it produced broken repetitive output — similar to what we observed with Gemma 26B Q2. The 9B model architecture doesn't handle sub-4-bit quantization well. Q4_K_M is the minimum viable quantization for this model.

About BatiFlow

flow.bati.ai

BatiFlow is a macOS-native AI desktop automation app — just 5MB, built with Swift.


Free & Unlimited	On-device AI via Ollama — no API costs, no usage limits, no subscriptions
100% Private	All data stays on your Mac. Nothing is sent to the cloud
Ultra Lightweight	Native macOS app, only 5MB. No Electron, no bloat
5-Minute Setup	Download, install Ollama, start automating

Technical Details

Original Model: Qwen/Qwen3.5-9B
Architecture: Qwen 3.5 (9B dense, 256K context)
License: Apache 2.0
Quantized with: llama.cpp (build 8674)
Quantized by: BatiAI

License

Quantized from Qwen/Qwen3.5-9B. License: Apache 2.0.

Downloads last month: 1,476

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

6-bit

Model tree for batiai/Qwen3.5-9B-GGUF

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Quantized

(174)

this model