Would this work with the FP8 version of the model?

#5
by pathosethoslogos - opened

Qwen provides the FP8 quant of the model.

What about 4 bit quants from other uploaders? Etc.?

FP8 works fine, try with coding, not chat

FP8 works fine, try with coding, not chat

thinking mode or non-thinking mode ?

@clavie This model should work with both thinking mode and non-thinking mode, though it was trained with thinking traces.

Extremely low accept rate, below 10%. Actually slow down the speed

Sign up or log in to comment