Cannot reproduce AIME_2025 result on the FP8 checkpoint

#7
by chenjiel - opened

Hi, Qwen team,

Could you share the instructions how to reproduce the AIME_2025 result for the FP8 checkpoint? From our benchmarking result, it looks like that the AIME_2025 score of the FP8 checkpoint is much lower than the measurement from the BF16 checkpoint published.

Sign up or log in to comment