Qwen3.5-4B-f32-GGUF

Qwen3.5-4B from Alibaba's Qwen team is a 4B-parameter dense multimodal causal language model with integrated vision encoder, part of the efficient Qwen3.5 small series (0.8B-9B) featuring hybrid Gated DeltaNet architecture (8×[3×(Gated DeltaNet → FFN) → 1×(Gated Attention → FFN)] layout with 32 layers, hidden dim 2560, 248K vocab for 201 languages), multi-token prediction, and massive 262K native context (extensible to 1M+ tokens). It matches Qwen3-30B performance on MMLU-Pro while beating GPT-5-Nano across vision benchmarks (OCRBench, MathVista, VideoMME, RefCOCO), runs on just 8GB VRAM (BF16) or ~3GB (4-bit) for edge deployment, and supports toggleable thinking mode for complex reasoning vs fast inference trade-offs in OCR, video understanding, coding, math, and agentic workflows. Apache 2.0-licensed with base model and 8 quantized variants (GGUF), it excels as a lightweight native multimodal foundation for mobile/embedded systems needing high-context multilingual capabilities without separate VL components.

Model Files

File Name	Quant Type	File Size	File Link
Qwen3.5-4B.BF16.gguf	BF16	8.42 GB	Download
Qwen3.5-4B.F16.gguf	F16	8.42 GB	Download
Qwen3.5-4B.F32.gguf	F32	16.8 GB	Download
Qwen3.5-4B.Q8_0.gguf	Q8_0	4.48 GB	Download
Qwen3.5-4B.mmproj-bf16.gguf	mmproj-bf16	676 MB	Download
Qwen3.5-4B.mmproj-f16.gguf	mmproj-f16	676 MB	Download
Qwen3.5-4B.mmproj-f32.gguf	mmproj-f32	1.33 GB	Download
Qwen3.5-4B.mmproj-q8_0.gguf	mmproj-q8_0	367 MB	Download