How to combine `thinking on/off` prompt with existing system prompt.
Whats a good system prompt format for which youe hav trained the model with
thinking = "on" # or "off"
# option 1:
[{"role": "system", "content": f"detailed thinking {thinking}. You are an expert in math."}, {"role": "user", "content": "Solve x*(sin(x)+2)=0"}]))
# option 2:
[{"role": "system", "content": f"detailed thinking {thinking}"}, {"role": "system", "content": f"You are an expert in math."}, {"role": "user", "content": "Solve x*(sin(x)+2)=0"}]))
Best
michaelfeil
option 1 is better than option 2 but it was not trained with system prompts other than detailed thinking on/off, so it might be better to do:
[{"role": "system", "content": f"detailed thinking {thinking}."}, {"role": "user", "content": "You are an expert in math. Solve x*(sin(x)+2)=0"}]))
Hey @michaelfeil , I had the same question and ran some tests against the NIM API. @einsteiner1983 is right β custom personas get ignored when thinking=ON.
Quick findings:
| System Prompt | Thinking | Persona Works? |
|---|---|---|
"detailed thinking on" |
ON | β generic output |
"detailed thinking off. You are an expert..." |
OFF | β persona followed |
"detailed thinking on. You are an expert..." |
ON | β still generic |
Prepend/append position doesn't matter. Putting persona in user message doesn't help either.
Also tested:
/think/no_thinktags: completely ignored (system prompt wins)- Budget hints ("think briefly"): don't work β "minimal reasoning" actually produced 21% MORE thinking π
Workaround: Two-phase approach β first call with thinking=ON for reasoning, second call with thinking=OFF + persona for formatting.
For budget control: vLLM β₯0.15 has max_think_tokens via PR #20859 if you're self-hosting.
Test scripts: https://github.com/chjkh8113/nemotron-community-testing