Why is Gemma 4 2B same size as Gemma 3 12B?

#4
by SatyaUdayB - opened

Hi everyone,

I’m trying to understand a discrepancy in model sizes and would appreciate some clarification.

I noticed that a Gemma 3 12B parameter model and a Gemma 4 2B parameter model both occupy roughly 8 GB of storage, despite the large difference in parameter count.

From my understanding, model size should scale with the number of parameters, so I’m wondering:

  • Is this due to different quantization levels (e.g., INT4 vs FP16)?
  • Are there architectural differences or additional components in Gemma 4 that increase its size?
  • Could this be related to checkpoint format or storage overhead (GGUF, safetensors, etc.)?
  • Or am I missing something fundamental about how model sizes are calculated?

If anyone has insights or has worked with these models, I’d really appreciate a detailed explanation.

Thanks in advance!

It is not a 2B model. If you notice the naming, it is an E2B (E = Effective) model. Actually, it is a 5B model.

Google org

Hi @SatyaUdayB
E2B as a dense multimodal model with 2.3B effective parameters but 5.1B parameters including embeddings . So the "2B" figure is a compute-efficiency metric, not storage size. The "E" in E2B/E4B stands for "effective" parameters. The E2B model incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. PLE means each transformer layer gets its own embedding table instead of sharing one across the model. This significantly increases stored parameter count.
For detailed explanation about the Gemma 4 models architecture, pls refer this https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4.
Thanks

Sign up or log in to comment