16 - 24B models with FP8 quantization
Hello Deepseek team,
Your models are greate. I wish I can try it locally, it's too big for me. I think most of people like me that want to try your models locally.
Could you please release smaller models, such as 16B, 24B with FP8 quantization? Because, like me, I can build a PC with 32G vRam, so 16-24B FP8 models are reasonable.
Many thanks.
I'm so active
It won't be any better than gemma 4 31b so why?
@Tikhonum Gemma 4 31b is unquestionably the best small model; easily outperforming Qwen3.5 across most tasks. However, Google deliberately crippled Gemma, so in theory DeepSeek could easily make a comparably sized model that's far more usable to the general population despite having slightly lower scores on standardized STEM-focused tests.
That is, Google's primary revenue stream comes from search/ads, which is why Gemini is the most generally capable AI model with the most broad knowledge (e.g. highest SimpleQA score). Because of this Google doesn't want local models to be generally capable, which would reduce online users, hence its profits. Consequently, Google deliberately designed Gemma to hallucinate like crazy when it comes to what most people in the general population care about.
This has nothing to do with Gemma 4's relatively small size (31b) since much smaller earlier models like Llama 3.1 8b, including Google's own Gemma 2 9b, have significantly more broad knowledge.
Anyways, by making a model good at coding, math, STEM, writing stories, and so on, but not good enough to justify the performance drop relative to the proprietary models (e.g. you're better off paying a few buck for a proprietary coding model then wasting hours fixing the sub-par coding of Gemma), and then making Gemma little more than a hallucination generator when it comes to what most people in the general population care about Google not only protects their business model, but pushes out potential competition from DeepSeek and others who can't match the performance in STEM focused domains without also crippling their general performance. So in the end there's a bunch of generally useless OS models.
Still, kudos to Google for releasing the best performing small OS model. But its release was strategic, not altruistic.
That sounds reasonable. But fortunately, Qwen appears as a king in the small open models, especially qwen3.6 recently.
I think general users will try many models to choose the best one. Gemma4 will be forgetten soon if it isn’t good as expected.
In fact, models from china are best in all categories, such as text, image, video, embedding, reranker,…. So, hope that Deepseek also help comunity with best models.
@Duonglv Models from China are NOT the best in all categories, and that's not my personal opinion.
For example, across the board on https://arena.ai Gemma 4 handily outperforms Qwen 3.5 in every tested category, and Qwen 3.6 wasn't even evaluated because it's the exact same model, just grossly overtrained for select domains, hence it performs generally worse than Qwen 3.5.
And in my testing even though Qwen 3.5 is a notably improvement over Qwen 3, Gemma 4 easily outperforms it in virtually every category. Plus professional institutions using exhaustive and complex hidden tests came to the same conclusions as LMsys and myself. For example, The Center for AI Standards and Innovation (CAISI) found DeepSeek v4 is the most powerful Chinese model, but is still well behind the latest models from OpenAI and Anthropic.
For the life of me I can't figure out the fanboy obsession the coding obsessed early adopting community has for Qwen models. Even when a clearly superior model family is released (Gemma 4) they still claim the infinite looping, token burning, and mistake prone Qwen 3.5 family is better.