β¨ Also available: Qwen 3.6 35B-A3B GGUF β newer-generation MoE (only 3B active/token) with big agentic-coding gains. IQ3_XXS (13 GB) fits in 16 GB Mac mini; IQ4_XS on 24 GB+. β Model card
Qwen 3.5 9B GGUF β Quantized by BatiAI
Optimized GGUF quantizations of Qwen/Qwen3.5-9B for on-device AI on Mac. Quantized directly from official Alibaba weights by BatiAI for BatiFlow β free, unlimited, on-device AI automation for Mac.
Quick Start
# 16GB Mac β Best balance (recommended)
ollama pull batiai/qwen3.5-9b:q4
# 16GB Mac β Higher quality (slower on 16GB)
ollama pull batiai/qwen3.5-9b:q6
Available Quantizations
| Quant | Size | 16GB Mac mini M4 | MacBook Pro M4 Max (128GB) | Recommended For |
|---|---|---|---|---|
| Q4_K_M | 5.2GB | 12.5 t/s β | 43.2 t/s | 16GB Mac (recommended) |
| Q6_K | 6.9GB | 4.2 t/s β οΈ slower | 40.8 t/s | 16GB Mac (higher quality, slower) |
Benchmarks β Real Hardware
Mac mini M4 (16GB) β Primary target
| Metric | Q4_K_M | Q6_K |
|---|---|---|
| Token generation | 12.5 t/s | 4.2 t/s |
| Prompt eval | 21.65 t/s | 1.06 t/s |
| Load time | 0.1s | 7.5s |
| Korean output | β Excellent | β Good |
| Usable? | β Fast enough | β οΈ Usable but slow |
MacBook Pro M4 Max (128GB)
| Metric | Q4_K_M | Q6_K |
|---|---|---|
| Token generation | 43.2 t/s | 40.8 t/s |
| Korean output | β | β |
vs Gemma 4 26B on 16GB Mac
| Model | Speed on 16GB Mac | Verdict |
|---|---|---|
| batiai/gemma4-26b:q3 (12GB) | 0.30 t/s | β Unusable |
| batiai/qwen3.5-9b:q4 (5.2GB) | 12.5 t/s | β 40x faster |
For 16GB Mac users, Qwen 3.5 9B Q4 is the clear winner β fast, smart, and fits comfortably in RAM.
Why Qwen 3.5 9B?
- Benchmark champion: Outperforms GPT-OSS-120B on MMLU-Pro despite being 13x smaller
- Best tool calling: Top-tier function calling accuracy among open models
- Multilingual: 100+ languages including excellent Korean
- Apache 2.0: Fully open, no restrictions
- 5.2GB Q4: Leaves 10GB free RAM on 16GB Mac β no swap, no lag
What About IQ3_M?
We tested IQ3_M (imatrix, 4.1GB) quantization. On 16GB Mac mini, it produced broken repetitive output β similar to what we observed with Gemma 26B Q2. The 9B model architecture doesn't handle sub-4-bit quantization well. Q4_K_M is the minimum viable quantization for this model.
About BatiFlow
BatiFlow is a macOS-native AI desktop automation app β just 5MB, built with Swift.
| Free & Unlimited | On-device AI via Ollama β no API costs, no usage limits, no subscriptions |
| 100% Private | All data stays on your Mac. Nothing is sent to the cloud |
| Ultra Lightweight | Native macOS app, only 5MB. No Electron, no bloat |
| 5-Minute Setup | Download, install Ollama, start automating |
Technical Details
- Original Model: Qwen/Qwen3.5-9B
- Architecture: Qwen 3.5 (9B dense, 256K context)
- License: Apache 2.0
- Quantized with: llama.cpp (build 8674)
- Quantized by: BatiAI
License
Quantized from Qwen/Qwen3.5-9B. License: Apache 2.0.
- Downloads last month
- 1,476
4-bit
6-bit