KL Div
Either you forgot a zero (hopefully), or I don't think you understand how it works.
A KL divergence of 0.3 (or, lol, 0.7) means that the model is so damaged, it has barely anything in common with the original. Any value above 0.1 should be considered as, pretty much, worthless.
have you encountered any issue?
So it's not a typo?
That's why I asked to begin with. Is it worth the bother of downloading it. If it's a typo, sure, at 0.03, it would be slightly better than what people achieve on average with this model for now (even your refusals are a bit too high, but that'd match what I'd expect for 0.03), making it quite interesting. At 0.3 (or 0.7) it's not Gemma 4 anymore; the maximum value is 1 for KL Divergence.
Works fine for me, use another model, i am not forcing you to use this
Kooten did not disclose whether it was the mean or median KL divergence.
I'd bet it was the mean. Often the median is posted, and occasionally both.