✨ Also available: Qwen 3.6 35B-A3B GGUF β€” newer-generation MoE (only 3B active/token) with big agentic-coding gains. IQ3_XXS (13 GB) fits in 16 GB Mac mini; IQ4_XS on 24 GB+. β†’ Model card

Qwen 3.5 9B GGUF β€” Quantized by BatiAI

BatiFlow Ollama

Optimized GGUF quantizations of Qwen/Qwen3.5-9B for on-device AI on Mac. Quantized directly from official Alibaba weights by BatiAI for BatiFlow β€” free, unlimited, on-device AI automation for Mac.

Quick Start

# 16GB Mac β€” Best balance (recommended)
ollama pull batiai/qwen3.5-9b:q4

# 16GB Mac β€” Higher quality (slower on 16GB)
ollama pull batiai/qwen3.5-9b:q6

Available Quantizations

Quant Size 16GB Mac mini M4 MacBook Pro M4 Max (128GB) Recommended For
Q4_K_M 5.2GB 12.5 t/s βœ… 43.2 t/s 16GB Mac (recommended)
Q6_K 6.9GB 4.2 t/s ⚠️ slower 40.8 t/s 16GB Mac (higher quality, slower)

Benchmarks β€” Real Hardware

Mac mini M4 (16GB) β€” Primary target

Metric Q4_K_M Q6_K
Token generation 12.5 t/s 4.2 t/s
Prompt eval 21.65 t/s 1.06 t/s
Load time 0.1s 7.5s
Korean output βœ… Excellent βœ… Good
Usable? βœ… Fast enough ⚠️ Usable but slow

MacBook Pro M4 Max (128GB)

Metric Q4_K_M Q6_K
Token generation 43.2 t/s 40.8 t/s
Korean output βœ… βœ…

vs Gemma 4 26B on 16GB Mac

Model Speed on 16GB Mac Verdict
batiai/gemma4-26b:q3 (12GB) 0.30 t/s ❌ Unusable
batiai/qwen3.5-9b:q4 (5.2GB) 12.5 t/s βœ… 40x faster

For 16GB Mac users, Qwen 3.5 9B Q4 is the clear winner β€” fast, smart, and fits comfortably in RAM.

Why Qwen 3.5 9B?

  • Benchmark champion: Outperforms GPT-OSS-120B on MMLU-Pro despite being 13x smaller
  • Best tool calling: Top-tier function calling accuracy among open models
  • Multilingual: 100+ languages including excellent Korean
  • Apache 2.0: Fully open, no restrictions
  • 5.2GB Q4: Leaves 10GB free RAM on 16GB Mac β€” no swap, no lag

What About IQ3_M?

We tested IQ3_M (imatrix, 4.1GB) quantization. On 16GB Mac mini, it produced broken repetitive output β€” similar to what we observed with Gemma 26B Q2. The 9B model architecture doesn't handle sub-4-bit quantization well. Q4_K_M is the minimum viable quantization for this model.

About BatiFlow

flow.bati.ai

BatiFlow is a macOS-native AI desktop automation app β€” just 5MB, built with Swift.

Free & Unlimited On-device AI via Ollama β€” no API costs, no usage limits, no subscriptions
100% Private All data stays on your Mac. Nothing is sent to the cloud
Ultra Lightweight Native macOS app, only 5MB. No Electron, no bloat
5-Minute Setup Download, install Ollama, start automating

Download BatiFlow

Technical Details

  • Original Model: Qwen/Qwen3.5-9B
  • Architecture: Qwen 3.5 (9B dense, 256K context)
  • License: Apache 2.0
  • Quantized with: llama.cpp (build 8674)
  • Quantized by: BatiAI

License

Quantized from Qwen/Qwen3.5-9B. License: Apache 2.0.

Downloads last month
1,476
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for batiai/Qwen3.5-9B-GGUF

Finetuned
Qwen/Qwen3.5-9B
Quantized
(174)
this model