DeepSeek-V4-Flash GGUF (community quants)
Quantized variants of deepseek-ai/DeepSeek-V4-Flash (284B params, 13B active per token).
Quants
| Quant | Approx size | Recommended? |
|---|---|---|
| ~~~54 GB~~ | Removed | |
| ~~~87 GB~~ | Removed | |
| ~~~109 GB~~ | Removed | |
| Q2_K | ~96 GB | |
| Q3_K_M | ~125 GB | |
| Q4_K_M | ~161 GB |
Provenance
All variants derived from Preyazz/DeepSeek-V4-Flash-Q8_0-GGUF, which is itself a lossless conversion of the original FP8 safetensors.
Compatibility
Requires llama.cpp built from PR #22378 (nisparks's wip/deepseek-v4-support branch) or later. The deepseek4 architecture is not yet in stable llama.cpp releases.
For Strix Halo / consumer ROCm: build with GGML_HIP_NO_VMM=ON (VMM=ON currently crashes on gfx1151 — see ROCm Issue #6146).
- Downloads last month
- 99,383
Hardware compatibility
Log In to add your hardware
2-bit
3-bit
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Preyazz/DeepSeek-V4-Flash-GGUF
Base model
deepseek-ai/DeepSeek-V4-Flash Quantized
Preyazz/DeepSeek-V4-Flash-Q8_0-GGUF