How to combine `thinking on/off` prompt with existing system prompt.

#8
by michaelfeil - opened

Whats a good system prompt format for which youe hav trained the model with

thinking = "on" # or "off"
# option 1:
[{"role": "system", "content": f"detailed thinking {thinking}. You are an expert in math."}, {"role": "user", "content": "Solve x*(sin(x)+2)=0"}]))
# option 2:
[{"role": "system", "content": f"detailed thinking {thinking}"}, {"role": "system", "content": f"You are an expert in math."}, {"role": "user", "content": "Solve x*(sin(x)+2)=0"}]))

Best
michaelfeil

option 1 is better than option 2 but it was not trained with system prompts other than detailed thinking on/off, so it might be better to do:

[{"role": "system", "content": f"detailed thinking {thinking}."}, {"role": "user", "content": "You are an expert in math. Solve x*(sin(x)+2)=0"}]))

Hey @michaelfeil , I had the same question and ran some tests against the NIM API. @einsteiner1983 is right β€” custom personas get ignored when thinking=ON.

Quick findings:

System Prompt Thinking Persona Works?
"detailed thinking on" ON ❌ generic output
"detailed thinking off. You are an expert..." OFF βœ… persona followed
"detailed thinking on. You are an expert..." ON ❌ still generic

Prepend/append position doesn't matter. Putting persona in user message doesn't help either.

Also tested:

  • /think /no_think tags: completely ignored (system prompt wins)
  • Budget hints ("think briefly"): don't work β€” "minimal reasoning" actually produced 21% MORE thinking πŸ™ƒ

Workaround: Two-phase approach β€” first call with thinking=ON for reasoning, second call with thinking=OFF + persona for formatting.

For budget control: vLLM β‰₯0.15 has max_think_tokens via PR #20859 if you're self-hosting.

Test scripts: https://github.com/chjkh8113/nemotron-community-testing

Sign up or log in to comment