Nice idea but is this something you want to do with a thinking model?

#1
by BingoBird - opened

Seems to me this is more suitable to do to a base model, not instruct or thinking.

But then again, what do I know.

Just not applicable to a chat scenario when it thinks for 4000 tokens before generating the 'quick' edgy response.

However it did respond well to disabling thinking:

$ llama-cli -t 4 -mli --ctx-size 8192 --jinja -ngl 0 -m /dl/Models/SmallModels/qwen3.5-0.8B-edgy-commenter-Q8_0.gguf --reasoning-budget 0 --chat-template-kwargs '{"reasoning_effort": "off"}'

Got 82.9t/s pp and 12.3 t/s tg on ryzen 3500u

About as smart as any 1-2B model I've used, but i couldn't elicit much edginess. Even after attempting to write trash-talk it still responded as a helpful assistant.

But, thanks for uploading and cheers!

Sign up or log in to comment