Would this work with the FP8 version of the model?
#5
by pathosethoslogos - opened
Qwen provides the FP8 quant of the model.
What about 4 bit quants from other uploaders? Etc.?
FP8 works fine, try with coding, not chat
FP8 works fine, try with coding, not chat
thinking mode or non-thinking mode ?
@clavie This model should work with both thinking mode and non-thinking mode, though it was trained with thinking traces.
Extremely low accept rate, below 10%. Actually slow down the speed