Gemma 4 seems to work best with high temperature for coding

#21
by Reverger - opened

@danielhanchen
Just reposting these findings from reddit.
https://www.reddit.com/r/LocalLLaMA/comments/1sg8r4l/gemma_4_seems_to_work_best_with_high_temperature/

This is insane. I used temp 1.5 and they're passing the carwash test easily now.
Gemma 4 seems to work best with high temperature for coding
I've been playing with Gemma 4 31B for coding tasks since it came out and been genuinely impressed with how capable it is. With the benchmarks putting it a little behind Qwen3.5 I didn't have high expectations, but it's honestly been performing better with what I've thrown at it so far

This has all been at the recommended parameters (temp 1.0, top-k 65 and top-p 0.95). With the general consensus being that for coding tasks you want a lower temperature I began repeating some of my tests with lower values (0.8, 0.6 and 0.3) but found if anything each step down made it worse

I just tested it on 26B A3B and 31B.
This is insane. I used temp 1.5 and they're passing the carwash test easily now.

Unsloth AI org

Interesting thanks for sharing your findings!

Not my experience with Q4_K_M:

Screenshot 2026-04-09 183428

gemma-4-26B-A4B-it-UD-IQ4_XS.gguf
temp 1, top-k 64, min_p 0

It depends on your logic:

**1. The Practical Logic:**
If you want to wash the **car**, you have to **drive the car** to the car wash. If you walk, you will arrive at the car wash, but your car will still be at home, dirty!

**2. The Efficiency Logic:**
50 meters is roughly 60 steps.
* **Driving:** You have to get in, start the engine, shift into gear, drive, brake, and park.
* **Walking:** You just start walking.

For a distance of only 50 meters, **walking is actually faster** in terms of total time spent, but since you need the vehicle to perform the task, **driving is the only option that works.**

**Verdict: Drive the car.**

The thing is you can ask it several times in new chat sessions and sometimes will get it right and sometimes will fail. Still pretty impressive, E4B consistency is to say walk.

The order of samplers application matters a lot.
Everyone supplies parameters but nobody supplies correct sampler chain, trusting llama.cpp's default one.

The right way to do it:
--samplers "temperature;top_p;top_k" --temperature 1 --top_p 0.95 --top_k 64

Screenshot 2026-04-09 214658

Screenshot 2026-04-09 214743

@mikelowski
Please share your parameters.

@mikelowski
Please share your parameters.

There you go:
"--n-gpu-layers", "999",
"-ot", ".ffn_.*_exps.=CPU",
"--batch-size", "512",
"--ubatch-size", "256",
"--flash-attn", "on",
"--threads", "16",
"--parallel", "1",
"--webui-mcp-proxy",
"--temp", "1.5",
"--top-p", "0.95",
"--top-k", "64",
"--ctx-size", "256000",
"-np", "1",
"--cache-type-k", "q8_0",
"--cache-type-v", "q8_0"

Sign up or log in to comment