Gemma 4 seems to work best with high temperature for coding

#21

by Reverger - opened 14 days ago

•

@danielhanchen
Just reposting these findings from reddit.
https://www.reddit.com/r/LocalLLaMA/comments/1sg8r4l/gemma_4_seems_to_work_best_with_high_temperature/

This is insane. I used temp 1.5 and they're passing the carwash test easily now.
Gemma 4 seems to work best with high temperature for coding
I've been playing with Gemma 4 31B for coding tasks since it came out and been genuinely impressed with how capable it is. With the benchmarks putting it a little behind Qwen3.5 I didn't have high expectations, but it's honestly been performing better with what I've thrown at it so far

This has all been at the recommended parameters (temp 1.0, top-k 65 and top-p 0.95). With the general consensus being that for coding tasks you want a lower temperature I began repeating some of my tests with lower values (0.8, 0.6 and 0.3) but found if anything each step down made it worse

I just tested it on 26B A3B and 31B.
This is insane. I used temp 1.5 and they're passing the carwash test easily now.

danielhanchen

Unsloth AI org 14 days ago

Interesting thanks for sharing your findings!

mikelowski

13 days ago

Not my experience with Q4_K_M:

urtuuuu

13 days ago

gemma-4-26B-A4B-it-UD-IQ4_XS.gguf
temp 1, top-k 64, min_p 0

It depends on your logic:

**1. The Practical Logic:**
If you want to wash the **car**, you have to **drive the car** to the car wash. If you walk, you will arrive at the car wash, but your car will still be at home, dirty!

**2. The Efficiency Logic:**
50 meters is roughly 60 steps.
* **Driving:** You have to get in, start the engine, shift into gear, drive, brake, and park.
* **Walking:** You just start walking.

For a distance of only 50 meters, **walking is actually faster** in terms of total time spent, but since you need the vehicle to perform the task, **driving is the only option that works.**

**Verdict: Drive the car.**

mikelowski

13 days ago

The thing is you can ask it several times in new chat sessions and sometimes will get it right and sometimes will fail. Still pretty impressive, E4B consistency is to say walk.

KeyboardMasher

13 days ago

The order of samplers application matters a lot.
Everyone supplies parameters but nobody supplies correct sampler chain, trusting llama.cpp's default one.

The right way to do it:
--samplers "temperature;top_p;top_k" --temperature 1 --top_p 0.95 --top_k 64

mikelowski

13 days ago

Reverger

13 days ago

@mikelowski
Please share your parameters.

mikelowski

12 days ago

•

edited 12 days ago

@mikelowski
Please share your parameters.

There you go:
"--n-gpu-layers", "999",
"-ot", ".ffn_.*_exps.=CPU",
"--batch-size", "512",
"--ubatch-size", "256",
"--flash-attn", "on",
"--threads", "16",
"--parallel", "1",
"--webui-mcp-proxy",
"--temp", "1.5",
"--top-p", "0.95",
"--top-k", "64",
"--ctx-size", "256000",
"-np", "1",
"--cache-type-k", "q8_0",
"--cache-type-v", "q8_0"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment