Spaces:

DontPlanToEnd
/

UGI-Leaderboard

Running

App Files Files Community

645

Identical models but different scores?

#605

by TPH441 - opened Mar 16

Discussion

TPH441

Mar 16

•

edited Mar 16

Bobi099/Qwen3.5-27B-heretic is literally just a duplicated repo of coder3101/Qwen3.5-27B-heretic, but they have different scores on the leadereboard.

DontPlanToEnd

Owner Mar 17

Yeah, I'm not able to test models fully deterministically, so there is a bit of variance between tests. vllm batching kinda inherently isn't deterministic, thinking models need randomness to think through unique ideas, and models don't write as well when using deterministic settings.

DontPlanToEnd changed discussion status to closed Mar 17

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment