Qwen3.5-4B-f32-GGUF
Qwen3.5-4B from Alibaba's Qwen team is a 4B-parameter dense multimodal causal language model with integrated vision encoder, part of the efficient Qwen3.5 small series (0.8B-9B) featuring hybrid Gated DeltaNet architecture (8×[3×(Gated DeltaNet → FFN) → 1×(Gated Attention → FFN)] layout with 32 layers, hidden dim 2560, 248K vocab for 201 languages), multi-token prediction, and massive 262K native context (extensible to 1M+ tokens). It matches Qwen3-30B performance on MMLU-Pro while beating GPT-5-Nano across vision benchmarks (OCRBench, MathVista, VideoMME, RefCOCO), runs on just 8GB VRAM (BF16) or ~3GB (4-bit) for edge deployment, and supports toggleable thinking mode for complex reasoning vs fast inference trade-offs in OCR, video understanding, coding, math, and agentic workflows. Apache 2.0-licensed with base model and 8 quantized variants (GGUF), it excels as a lightweight native multimodal foundation for mobile/embedded systems needing high-context multilingual capabilities without separate VL components.
Model Files
| File Name | Quant Type | File Size | File Link |
|---|---|---|---|
| Qwen3.5-4B.BF16.gguf | BF16 | 8.42 GB | Download |
| Qwen3.5-4B.F16.gguf | F16 | 8.42 GB | Download |
| Qwen3.5-4B.F32.gguf | F32 | 16.8 GB | Download |
| Qwen3.5-4B.Q8_0.gguf | Q8_0 | 4.48 GB | Download |
| Qwen3.5-4B.mmproj-bf16.gguf | mmproj-bf16 | 676 MB | Download |
| Qwen3.5-4B.mmproj-f16.gguf | mmproj-f16 | 676 MB | Download |
| Qwen3.5-4B.mmproj-f32.gguf | mmproj-f32 | 1.33 GB | Download |
| Qwen3.5-4B.mmproj-q8_0.gguf | mmproj-q8_0 | 367 MB | Download |
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
- Downloads last month
- 327
8-bit
16-bit
32-bit
