Qwen3.5-2B-f32-GGUF
Qwen3.5-2B from Alibaba's Qwen team is a compact 2B-parameter dense multimodal causal language model with vision encoder, part of the Qwen3.5 small series (0.8B-9B), featuring a hybrid Gated DeltaNet architecture (3:1 linear attention to softmax blocks for constant memory at 262K native context, extensible to 1M tokens), multi-token prediction, and 248K vocabulary spanning 201 languages. Trained via early-fusion multimodal pre-training and post-training, it delivers impressive OCRBench (84.5), VideoMME (75.6), and thinking-mode boosts on MMLU-Pro (66.5 vs 55.3 non-thinking), IFEval (78.6), outperforming prior 7B models while fitting in ~4GB VRAM (BF16) or 1.5GB (4-bit) for edge deployment on laptops, mobiles, or Pi-class devices. Supporting text/images/video natively with toggleable thinking for reasoning vs speed trade-offs, Apache 2.0-licensed for fine-tuning, it excels as an efficient agent foundation for constrained hardware needing OCR, video understanding, coding, and multilingual reasoning.
Model Files
| File Name | Quant Type | File Size | File Link |
|---|---|---|---|
| Qwen3.5-2B.BF16.gguf | BF16 | 3.78 GB | Download |
| Qwen3.5-2B.F16.gguf | F16 | 3.78 GB | Download |
| Qwen3.5-2B.F32.gguf | F32 | 7.54 GB | Download |
| Qwen3.5-2B.Q8_0.gguf | Q8_0 | 2.01 GB | Download |
| Qwen3.5-2B.mmproj-bf16.gguf | mmproj-bf16 | 671 MB | Download |
| Qwen3.5-2B.mmproj-f16.gguf | mmproj-f16 | 671 MB | Download |
| Qwen3.5-2B.mmproj-f32.gguf | mmproj-f32 | 1.33 GB | Download |
| Qwen3.5-2B.mmproj-q8_0.gguf | mmproj-q8_0 | 365 MB | Download |
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
- Downloads last month
- 248
8-bit
16-bit
32-bit
