A Q2_K version of https://huggingface.co/ssweens/deepseek-ai__DeepSeek-V4-Flash-GGUF-YMMV
---
---
---
---
---
## 🧪 Experimental GGUFs for DeepSeek-V4-Flash
A stopgap to experiment with DeepSeek-V4-Flash with CUDA and ROCm locally while the tools ecosystem catches up. Expect rough edges. Validated for text and coding coherence.
GGUF files for [deepseek-ai/DeepSeek-V4-Flash](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash).
### ⚠️ You need the custom fork
These GGUFs **require** a DeepSeek-V4-capable fork of llama.cpp. Vanilla llama.cpp doesn't support this architecture yet.
- [antirez](https://github.com/antirez) — llama.cpp fork for Metal and CUDA in [llama.cpp-deepseek-v4-flash](https://github.com/antirez/llama.cpp-deepseek-v4-flash)
- [ml-explore/mlx-lm #1192](https://github.com/ml-explore/mlx-lm/pull/1192) — MLX DSV4 attention reference that informed the architecture work
- [DeepSeek](https://github.com/deepseek-ai) — open inference code and the [technical report](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf)
- [nisparks et al](https://github.com/ggml-org/llama.cpp/issues/22319) - some early implementation efforts and discussion
- [llama.cpp](https://github.com/ggml-org/llama.cpp) — the project that makes local LLM inference possible