Nice idea but is this something you want to do with a thinking model?
#1
by BingoBird - opened
Seems to me this is more suitable to do to a base model, not instruct or thinking.
But then again, what do I know.
Just not applicable to a chat scenario when it thinks for 4000 tokens before generating the 'quick' edgy response.
However it did respond well to disabling thinking:
$ llama-cli -t 4 -mli --ctx-size 8192 --jinja -ngl 0 -m /dl/Models/SmallModels/qwen3.5-0.8B-edgy-commenter-Q8_0.gguf --reasoning-budget 0 --chat-template-kwargs '{"reasoning_effort": "off"}'
Got 82.9t/s pp and 12.3 t/s tg on ryzen 3500u
About as smart as any 1-2B model I've used, but i couldn't elicit much edginess. Even after attempting to write trash-talk it still responded as a helpful assistant.
But, thanks for uploading and cheers!