What should Top-k be set to?

#23
by daydreamwarrior - opened

I was looking at the discussion here huggingface.co/Nanbeige/Nanbeige4.1-3B/discussions/2, and the conclusion seems to be either 0 or 50.
The discussion mentions that 50 is likely used because the official example utilizes the transformers defaults. However, since that example didn't set Temperature or Top-p either, it doesn't seem very convincing/conclusive.
So, what is the recommended value for Top-k? Thanks.

After testing it for a while on same promts, i assume the best settings are --temp 0.6 --top-p 0.95 --top-k 40 --min-p 0.01
Just by looking at output quality...

Setting top_k to 0 indeed shortens the thought chain. I might personally find 0 to be better.

I searched some guides, such as llm-sampling-parameters-guide, and found that top_k should be set to 0 by default, unless there are specific requirements. Min_p is also an issue; I've temporarily set it to 0, which is also the default value in Ollama.

My parameters are not meant to shorten COT, i just compared outputs. I'll try your suggestion (minp, topp), but after doing many comparisons, i doubt it will be better.
For example with my parameters, it created perfectly working and good looking snake game in html. Neon design, score, "play again" button.

My parameters are not meant to shorten COT, i just compared outputs. I'll try your suggestion (minp, topp), but after doing many comparisons, i doubt it will be better.
For example with my parameters, it created perfectly working and good looking snake game in html. Neon design, score, "play again" button.

Could you share your test cases publicly? I assumed that tweaking these wouldn't significantly impact intelligence, at least not in terms of benchmark results like MMLU.

Perhaps min_p should indeed be set to 0.01, and now I completely agree with this. Thanks.

Sign up or log in to comment