Enabling or disabling reasoning (default is disabled)
Qwen3.5 Small models have reasoning disabled by default. To enable it (or disable it), see https://unsloth.ai/docs/models/qwen3.5#how-to-enable-or-disable-reasoning-and-thinking
I know this is a stupid question, but I'll ask it here. Are there any differences in launching for Windows? After adding --chat-template-kwargs '{"enable_thinking":true}' I get the error:
error while handling argument "--chat-template-kwargs": [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: '''
usage:
--chat-template-kwargs STRING sets additional params for the json template parser, must be a valid
json object string, e.g. '{"key1":"value1","key2":"value2"}'
(env: LLAMA_CHAT_TEMPLATE_KWARGS)
to show complete usage, run with -h
This is the latest release of llama.cpp.
Or for LMStudio, you can change Jinja to this:
Jinja
{%- set image_count = namespace(value=0) %}
{%- set video_count = namespace(value=0) %}
{%- macro render_content(content, do_vision_count, is_system_content=false) %}
{%- if content is string %}
{{- content }}
{%- elif content is iterable and content is not mapping %}
{%- for item in content %}
{%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
{%- if is_system_content %}
{{- raise_exception('System message cannot contain images.') }}
{%- endif %}
{%- if do_vision_count %}
{%- set image_count.value = image_count.value + 1 %}
{%- endif %}
{%- if add_vision_id %}
{{- 'Picture ' ~ image_count.value ~ ': ' }}
{%- endif %}
{{- '<|vision_start|><|image_pad|><|vision_end|>' }}
{%- elif 'video' in item or item.type == 'video' %}
{%- if is_system_content %}
{{- raise_exception('System message cannot contain videos.') }}
{%- endif %}
{%- if do_vision_count %}
{%- set video_count.value = video_count.value + 1 %}
{%- endif %}
{%- if add_vision_id %}
{{- 'Video ' ~ video_count.value ~ ': ' }}
{%- endif %}
{{- '<|vision_start|><|video_pad|><|vision_end|>' }}
{%- elif 'text' in item %}
{{- item.text }}
{%- else %}
{{- raise_exception('Unexpected item type in content.') }}
{%- endif %}
{%- endfor %}
{%- elif content is none or content is undefined %}
{{- '' }}
{%- else %}
{{- raise_exception('Unexpected content type.') }}
{%- endif %}
{%- endmacro %}
{%- if not messages %}
{{- raise_exception('No messages provided.') }}
{%- endif %}
{%- if tools and tools is iterable and tools is not mapping %}
{{- '<|im_start|>system\n' }}
{{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>" }}
{{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
{%- if messages[0].role == 'system' %}
{%- set content = render_content(messages[0].content, false, true)|trim %}
{%- if content %}
{{- '\n\n' + content }}
{%- endif %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- else %}
{%- if messages[0].role == 'system' %}
{%- set content = render_content(messages[0].content, false, true)|trim %}
{{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" %}
{%- set content = render_content(message.content, false)|trim %}
{%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if ns.multi_step_tool %}
{{- raise_exception('No user query found in messages.') }}
{%- endif %}
{%- for message in messages %}
{%- set content = render_content(message.content, true)|trim %}
{%- if message.role == "system" %}
{%- if not loop.first %}
{{- raise_exception('System message must be at the beginning.') }}
{%- endif %}
{%- elif message.role == "user" %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- set reasoning_content = reasoning_content|trim %}
{%- if loop.index0 > ns.last_query_index %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
{%- for tool_call in message.tool_calls %}
{%- if tool_call.function is defined %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{%- if loop.first %}
{%- if content|trim %}
{{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
{%- else %}
{{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
{%- endif %}
{%- else %}
{{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
{%- endif %}
{%- if tool_call.arguments is mapping %}
{%- for args_name in tool_call.arguments %}
{%- set args_value = tool_call.arguments[args_name] %}
{{- '<parameter=' + args_name + '>\n' }}
{%- set args_value = args_value | tojson if args_value is mapping or (args_value is iterable and args_value is not string) else args_value | string %}
{{- args_value }}
{{- '\n</parameter>\n' }}
{%- endfor %}
{%- endif %}
{{- '</function>\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.previtem and loop.previtem.role != "tool" %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if not loop.last and loop.nextitem.role != "tool" %}
{{- '<|im_end|>\n' }}
{%- elif loop.last %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- else %}
{{- raise_exception('Unexpected message role.') }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{{- '<think>\n' }}
{%- endif %}
Qwen3.5 Small models have reasoning disabled by default. To enable it (or disable it), see https://unsloth.ai/docs/models/qwen3.5#how-to-enable-or-disable-reasoning-and-thinking
doesn't work in llama-cli or server, syntax error
Okay, I figured out what the problem is. I need to escape the parentheses in Windows.
This works for me:
--chat-template-kwargs '{\"enable_thinking\":true}'
The advice above works for launching from PowerShell, but to launch from a bat file, you should specify it like this:
--chat-template-kwargs {\"enable_thinking\":true}
--chat-template-kwargs {\"enable_thinking\":true}
but this way thinking still won't work?
Hi! I noticed that params are different here and in unsloth.ai docs, which ones should we use?
Also, I noticed that 9B Q4_K_M is having problems using tools, while 4B Q8_0 works flawlessly. Has anyone faced similar issues? I tried both in Roo Code with llama.cpp
I'm using the Q6_K_XL quant, and no matter what I set the sampling parameters to I can't seem to prevent thinking loops on 9B (LM studio). Has anyone tested for better settings?
I'm using the Q6_K_XL quant, and no matter what I set the sampling parameters to I can't seem to prevent thinking loops on 9B (LM studio). Has anyone tested for better settings?
The recommended sampling parameters from Qwen helped me:
We recommend using the following set of sampling parameters for generation
- Thinking mode for general tasks:
temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0- Thinking mode for precise coding tasks (e.g. WebDev):
temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0- Instruct (or non-thinking) mode for general tasks:
temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0- Instruct (or non-thinking) mode for reasoning tasks:
temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0Please note that the support for sampling parameters varies according to inference frameworks.
How have you set presence penalty in LM studio? I saw a commit working on it however I don't believe it has made it to the UI yet.
How to enable thinking in Jan.ai?
How have you set presence penalty in LM studio? I saw a commit working on it however I don't believe it has made it to the UI yet.
You can't yet. Sadly.
I've been able to solve the issue with:
Temperature: 0.6
Top K Sampling: 20
Repeat Penalty: 1.05
Top P Sampling: 0.95
Min P Sampling: 0.05
And a system prompt:
You are Qwen, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
Answer directly and concisely, stay coherent, avoid repetitive thinking loops, and finish with a complete answer.
If context is missing, identify the gap briefly and continue with the best reasonable assumption.
This produces thinking blocks of reasonable size with minimal repeats, however the model may be more handicapped than if I just turned thinking off. It does not reliably pass the car wash test at 9B.
How have you set presence penalty in LM studio? I saw a commit working on it however I don't believe it has made it to the UI yet.
You can't yet. Sadly.
This is no longer true FYI, it's been added in LMStudio 0.4.7 beta.
Hello. I have errors stop rason generation failed in LM Studio after thinking phase. I'm enabled thinking by adding {%- set enable_thinking = true %} at the start of jinja template. What's wrong?
And sometimes I have generation errors without thinking.
Sure is nice that it needs to be so complicated unlike every other model that has a simple true or false in the jinja template. Anyways the fix is easy enough once you figure it out. Just change "is" to "is not" in the Jinja template for this line "{%- if enable_thinking is defined and enable_thinking is true %}"
Is it possible to enable thinking in llama_cpp?
Is it possible to enable thinking in llama_cpp?
It should be enabled by default, if not try '--reasoning on' or took from https://unsloth.ai/docs/models/qwen3.5#qwen3.5-small-0.8b-2b-4b-9b "For Qwen3.5 0.8B, 2B, 4B and 9B, reasoning is disabled by default. To enable it, use: --chat-template-kwargs '{"enable_thinking":true}'
On Windows use: --chat-template-kwargs "{"enable_thinking":true}""