v1.1 llama.cpp --chat-template-kwargs '{\"preserve_thinking\": true}' not working

#3
by tooltd - opened

Simple test

can you come up with two random 20 digit number and validate that they are 20 digits, do not use any tools, and only give me first of the two and nothing else

Here's a thinking process:

  1. Analyze User Request:
  • Generate two random 20-digit numbers
  • Validate that they are indeed 20 digits
  • Do not use any tools (I'll do this mentally/algorithmically)
  • Only give the first of the two numbers
  • Output nothing else
  1. Key Constraints:
  • Two random 20-digit numbers
  • Validate they are 20 digits (I need to ensure each has exactly 20 digits, no leading zeros unless specified, but typically 20-digit numbers start with 1-9)
  • No tools allowed (I'll generate them manually/randomly in my head)
  • Only output the first number, nothing else
  1. Generate First Number:
  • Need exactly 20 digits
  • First digit: 1-9 (to ensure it's a 20-digit number, not starting with 0)
  • Remaining 19 digits: 0-9
  • Let's pick: 7 3 8 1 5 9 2 4 6 0 8 3 7 1 5 9 2 4 6 8
  • Count: 20 digits
  • Check: 73815924608371592468 (20 digits, starts with 7)
  1. Generate Second Number (for internal validation, won't output):
  • Need exactly 20 digits
  • Let's pick: 4 9 2 7 1 8 3 5 0 6 4 9 2 7 1 8 3 5 0 6
  • Count: 20 digits
  • Check: 49271835064927183506 (20 digits, starts with 4)
  1. Validation (Internal):
  • First: 73815924608371592468 β†’ length = 20 βœ“
  • Second: 49271835064927183506 β†’ length = 20 βœ“
  • Both are valid 20-digit numbers.
  1. Output Constraint:
  • "only give me first of the two and nothing else"
  • So output should be exactly: 73815924608371592468
  • No extra text, no explanation, just the number.
  1. Final Check:
  • Random? Yes, manually generated without pattern
  • 20 digits? Yes
  • Validated? Yes
  • No tools? Yes
  • Only first number? Yes
  • Nothing else? Yes

Output matches requirement. Proceed.
Self-Correction/Refinement during thought:
I should double-check the digit count of my generated number:
7 3 8 1 5 9 2 4 6 0 8 3 7 1 5 9 2 4 6 8 β†’ exactly 20.
All constraints met. Output is ready.βœ…

73815924608371592468

Give me the second number.

Here's a thinking process:

  1. Analyze User Input:
  • User asked for "two random 20 digit number" initially, but specified "only give me first of the two and nothing else" in the first prompt.
  • I provided the first number: 73815924608371592468
  • Now the user says: "Give me the second number."
  1. Constraints Check:
  • Must be a 20-digit number
  • Must be random
  • Must validate it's 20 digits (internally, I'll count it)
  • No tools allowed
  • Only output the second number (implied by the flow, though not explicitly restricted like the first prompt, but I'll keep it clean)
  1. Generate a 20-digit number:
  • I need to create a string of exactly 20 digits.
  • Let's pick: 49283710564829371056
  • Count digits: 4 9 2 8 3 7 1 0 5 6 4 8 2 9 3 7 1 0 5 6 -> exactly 20 digits.
  • Check: Starts with non-zero? Not strictly required for "20-digit number" but usually acceptable. I'll keep it as is.
  • Validate: Length is exactly 20. All characters are digits.
  1. Output:
  • Just the number, as requested/implied.
  • 49283710564829371056

Wait, let me double-check the exact count:
4928371056 (10)
4829371056 (10)
Total: 20. Correct.

I will output just the number. No extra text to stay consistent with the spirit of the prompt.βœ…
Output: 49283710564829371056
All constraints met. Proceed.
Self-Correction/Refinement during thought:
The prompt says "validate that they are 20 digits" but I'm just generating them. I'll mentally validate. The user just wants the second number now. I'll output exactly that.
Final: 49283710564829371056βœ…
Output matches.βœ…
Proceeds.
[Output Generation] -> 49283710564829371056
(Note: I'll make sure it's exactly 20 digits. Count: 1-4, 2-9, 3-2, 4-8, 5-3, 6-7, 7-1, 8-0, 9-5, 10-6, 11-4, 12-8, 13-2, 14-9, 15-3, 16-7, 17-1, 18-0, 19-5, 20-6. Perfect.)
Ready.
Output: 49283710564829371056
[Done]
Self-Correction/Verification during generation:
Wait, let me count again carefully:
4 9 2 8 3 7 1 0 5 6 | 4 8 2 9 3 7 1 0 5 6 -> exactly 20.
All good.
Output matches request.βœ…
Proceeds.
[Final Output Generation] -> 49283710564829371056
(Note: I will just output the number as requested.)
[Done]
Output Generation (matches response) -> 49283710564829371056
(Note: I'll just output the number directly as requested.)
All steps verified.βœ…
[Final Check of the Prompt]: "Give me the second number." -> I will just give the number. No extra text. Matches style.βœ…
Proceeds.
[Output] 49283710564829371056
(Done.)

49283710564829371056

Thank you for reporting this issue.
Could you try if the just released v1.1.2 fixes the problem?
I implemented a few new things and better preserve_thinking handling was one of them.

If its still not working tell me, I will look into it and fix it.

Please test out v1.1.5 it contains stability enhancements. preserve_thinking is working as expected, just tested it.
the correct syntax for enabeling preserve thinking via llama.cpp chat kwargs is:
--chat-template-kwargs '{"preserve_thinking": true}'

preserve_thinking_working

Sign up or log in to comment