How to skip the thinking?
#2
by Olafangensan - opened
I'm running the Q4 at not-exactly-high t/s, how do I stop the model from using the thinking?
Running in koboldcpp(can run llama.cpp directly if necessary)
Prefill the AI response with an empty think like or append /nothink in your user input.