Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
nightmedia 
posted an update 4 days ago
Post
2452
Updated gemma-4-E4B-it metrics

I noticed the chat template got updated, and tried it on the E4B, with surprising results in stabilizing the brainwave.

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.480,0.656,0.797,0.608,0.400,0.755,0.665
mxfp4    0.455,0.607,0.851,0.585,0.402,0.744,0.651

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    35.937 ± 0.525  14.80 GB      1153
mxfp4    36.746 ± 0.534  11.06 GB      1030


Old numbers
quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.404,0.489,0.825,0.586,0.392,0.734,0.661
mxfp4    0.414,0.508,0.854,0.562,0.378,0.717,0.645

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    34.652 ± 0.502  14.80 GB      1146
mxfp4    35.203 ± 0.506  11.06 GB      1200


I will re-do all baselines soon based on the new template. It is completely expected that the model behavior will change as a result.

Here are the effects of the new template on few known distills from DavidAU

gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED

quant    arc   arc/e boolq hswag obkqa piqa  wino
New template
mxfp8    0.518,0.709,0.755,0.657,0.418,0.759,0.626
mxfp4    0.485,0.682,0.792,0.641,0.432,0.746,0.635
Old template
mxfp8    0.506,0.697,0.754,0.661,0.416,0.757,0.627
mxfp4    0.487,0.670,0.792,0.644,0.430,0.748,0.624


gemma-4-E4B-it-GLM-4.7-Flash-HERETIC-UNCENSORED-Thinking
mxfp8   0.461,0.599,0.779,0.630,0.406,0.766,0.629
Old template
mxfp8   0.456,0.580,0.786,0.629,0.410,0.764,0.633


gemma-4-E4B-it-Claude-Opus-4.5-HERETIC-UNCENSORED-Thinking
mxfp8    0.509,0.705,0.806,0.646,0.416,0.773,0.650
Old template
mxfp8    0.502,0.692,0.809,0.650,0.420,0.771,0.651

Definitive numbers on E4B sets

After re-doing the metrics with the new template, these are the latest

gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED-Thinking

quant    arc   arc/e boolq hswag obkqa piqa  wino
bf16     0.518,0.713,0.745,0.656,0.416,0.762,0.636
mxfp8    0.518,0.709,0.755,0.657,0.418,0.759,0.626
mxfp4    0.485,0.682,0.792,0.641,0.432,0.746,0.635

gemma-4-E4B-it-The-DECKARD-V2-Strong-HERETIC-UNCENSORED-Thinking

quant    arc   arc/e boolq hswag obkqa piqa  wino
bf16     0.509,0.721,0.780,0.656,0.432,0.773,0.639
mxfp8    0.515,0.712,0.785,0.656,0.426,0.767,0.639

gemma-4-E4B-it-The-DECKARD-HERETIC-UNCENSORED-Thinking

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.516,0.709,0.794,0.649,0.416,0.761,0.639

Note: These numbers are with the old version of mxfp8, see below

What happened to gemma-4 mxfp8 quanting?

You might have noticed my numbers changing lately a few times. I finally found why. The quants I published for the E4B were made on April 8.

With the updated template, the numbers as currently displayed, stand

However, I re-quanted some of the E4B because I had removed them from the local repo, and the ones quanted yesterday are different. Not better

gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.518,0.709,0.755,0.657,0.418,0.759,0.626

Quant created yesterday:

mxfp8    0.508,0.707,0.756,0.658,0.424,0.760,0.624

This is missing from the tensors:

language_model.model.per_layer_model_projection.scales

also missing in the recent mxfp8 of gemma-4-26B-A4B-it:

language_model.model.layers.[0-29].mlp.down_proj.biases
language_model.model.layers.[0-29].mlp.gate_proj.biases
language_model.model.layers.[0-29].mlp.up_proj.biases
language_model.model.layers.[0-29].router.proj.biases
In this post