Qwen3.5-2B-f32-GGUF

Qwen3.5-2B from Alibaba's Qwen team is a compact 2B-parameter dense multimodal causal language model with vision encoder, part of the Qwen3.5 small series (0.8B-9B), featuring a hybrid Gated DeltaNet architecture (3:1 linear attention to softmax blocks for constant memory at 262K native context, extensible to 1M tokens), multi-token prediction, and 248K vocabulary spanning 201 languages. Trained via early-fusion multimodal pre-training and post-training, it delivers impressive OCRBench (84.5), VideoMME (75.6), and thinking-mode boosts on MMLU-Pro (66.5 vs 55.3 non-thinking), IFEval (78.6), outperforming prior 7B models while fitting in ~4GB VRAM (BF16) or 1.5GB (4-bit) for edge deployment on laptops, mobiles, or Pi-class devices. Supporting text/images/video natively with toggleable thinking for reasoning vs speed trade-offs, Apache 2.0-licensed for fine-tuning, it excels as an efficient agent foundation for constrained hardware needing OCR, video understanding, coding, and multilingual reasoning.

Model Files

File Name	Quant Type	File Size	File Link
Qwen3.5-2B.BF16.gguf	BF16	3.78 GB	Download
Qwen3.5-2B.F16.gguf	F16	3.78 GB	Download
Qwen3.5-2B.F32.gguf	F32	7.54 GB	Download
Qwen3.5-2B.Q8_0.gguf	Q8_0	2.01 GB	Download
Qwen3.5-2B.mmproj-bf16.gguf	mmproj-bf16	671 MB	Download
Qwen3.5-2B.mmproj-f16.gguf	mmproj-f16	671 MB	Download
Qwen3.5-2B.mmproj-f32.gguf	mmproj-f32	1.33 GB	Download
Qwen3.5-2B.mmproj-q8_0.gguf	mmproj-q8_0	365 MB	Download