How to skip the thinking?

#2
by Olafangensan - opened

I'm running the Q4 at not-exactly-high t/s, how do I stop the model from using the thinking?

Running in koboldcpp(can run llama.cpp directly if necessary)

Prefill the AI response with an empty think like or append /nothink in your user input.

Sign up or log in to comment