Instructions to use StableQuant/Qwen-Templates-Rebuild-Project with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use StableQuant/Qwen-Templates-Rebuild-Project with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="StableQuant/Qwen-Templates-Rebuild-Project")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("StableQuant/Qwen-Templates-Rebuild-Project", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use StableQuant/Qwen-Templates-Rebuild-Project with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "StableQuant/Qwen-Templates-Rebuild-Project"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project

SGLang

How to use StableQuant/Qwen-Templates-Rebuild-Project with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StableQuant/Qwen-Templates-Rebuild-Project",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use StableQuant/Qwen-Templates-Rebuild-Project with Docker Model Runner:
```
docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project
```

StableQuant commited on 2 days ago

Commit

093dae2

verified ·

1 Parent(s): 32d33fa

Upload qwen3-5_6-template_v1.1.jinja

Browse files

Files changed (1) hide show

qwen3-5_6-template_v1.1.jinja +290 -0

qwen3-5_6-template_v1.1.jinja ADDED Viewed

	@@ -0,0 +1,290 @@

+{#- ===== SECTION 1: MACRO render_content =====
+     Handles string, list (image/video/text items), or None/undefined.
+     count_vision=true: increments ns.image_count / ns.video_count.
+-#}
+{%- macro render_content(content, count_vision=false) -%}
+  {%- if content is string -%}
+    {{- content -}}
+  {%- elif content is iterable and content is not mapping -%}
+    {%- for item in content -%}
+      {%- if item.type == 'image' or 'image' in item or 'image_url' in item -%}
+        {%- if count_vision -%}{%- set ns.image_count = ns.image_count + 1 -%}{%- endif -%}
+        {%- if add_vision_id is defined and add_vision_id -%}
+          {{- 'Picture ' ~ ns.image_count ~ ': ' -}}
+        {%- endif -%}
+        {{- '<|vision_start|><|image_pad|><|vision_end|>' -}}
+      {%- elif item.type == 'video' or 'video' in item -%}
+        {%- if count_vision -%}{%- set ns.video_count = ns.video_count + 1 -%}{%- endif -%}
+        {%- if add_vision_id is defined and add_vision_id -%}
+          {{- 'Video ' ~ ns.video_count ~ ': ' -}}
+        {%- endif -%}
+        {{- '<|vision_start|><|video_pad|><|vision_end|>' -}}
+      {%- elif item.type == 'text' or 'text' in item -%}
+        {{- item.text -}}
+      {%- endif -%}
+    {%- endfor -%}
+  {%- endif -%}
+{%- endmacro -%}
+{#- ===== SECTION 2: NAMESPACE INITIALISATION =====
+     Single ns object for all mutable state.
+     enable_thinking  default=true  (BUG-003 fix)
+     preserve_thinking default=true: when false, suppresses think-block output in
+                       generation prompt and overrides enable_thinking to false.
+                       Passed via --chat-template-kwargs {"preserve_thinking":false}.
+-#}
+{%- set ns = namespace(
+    enable_thinking=true,
+    image_count=0,
+    video_count=0
+) -%}
+{#- Resolve enable_thinking kwarg -#}
+{%- if enable_thinking is defined -%}
+  {%- if enable_thinking -%}
+    {%- set ns.enable_thinking = true -%}
+  {%- else -%}
+    {%- set ns.enable_thinking = false -%}
+  {%- endif -%}
+{%- endif -%}
+{#- Resolve preserve_thinking kwarg.
+    preserve_thinking=false  => force non-thinking mode (same as enable_thinking=false).
+    preserve_thinking=true   => default, no override (thinking controlled by enable_thinking).
+    When not defined         => default, no override.
+-#}
+{%- if preserve_thinking is defined and not preserve_thinking -%}
+  {%- set ns.enable_thinking = false -%}
+{%- endif -%}
+{#- ===== SECTION 3: PRE-SCAN =====
+     Track last /no_think or /think flag in user messages.
+     Also scan system messages for <|think_off|> / <|think_on|> markers
+     (allows apps to control thinking mode via system prompt injection).
+     The model follows the last flag encountered in multi-turn conversations.
+-#}
+{%- for i in range(messages | length) -%}
+  {%- set _msg = messages[i] -%}
+  {%- if _msg.role == 'user' -%}
+    {%- set _u = _msg.content if _msg.content is string else '' -%}
+    {%- if _u.rstrip().endswith('/no_think') -%}
+      {%- set ns.enable_thinking = false -%}
+    {%- elif _u.rstrip().endswith('/think') -%}
+      {%- set ns.enable_thinking = true -%}
+    {%- endif -%}
+  {%- elif _msg.role == 'system' or _msg.role == 'developer' -%}
+    {%- set _s = _msg.content if _msg.content is string else '' -%}
+    {%- if '<|think_off|>' in _s -%}
+      {%- set ns.enable_thinking = false -%}
+    {%- elif '<|think_on|>' in _s -%}
+      {%- set ns.enable_thinking = true -%}
+    {%- endif -%}
+  {%- endif -%}
+{%- endfor -%}
+{#- ===== SECTION 4: COLLECT SYSTEM CONTENT =====
+     Merge all system/developer messages with \n\n separator (BUG-004 fix).
+     <|think_off|> / <|think_on|> markers are stripped from output.
+-#}
+{%- set ns_sys = namespace(content='') -%}
+{%- for msg in messages -%}
+  {%- if msg.role == 'system' or msg.role == 'developer' -%}
+    {%- set _c = render_content(msg.content | default('')) | trim -%}
+    {%- set _c = _c | replace('<|think_off|>', '') | replace('<|think_on|>', '') | trim -%}
+    {%- if _c -%}
+      {%- if ns_sys.content == '' -%}
+        {%- set ns_sys.content = _c -%}
+      {%- else -%}
+        {%- set ns_sys.content = ns_sys.content + '\n\n' + _c -%}
+      {%- endif -%}
+    {%- endif -%}
+  {%- endif -%}
+{%- endfor -%}
+{#- ===== SECTION 5: BUILD TOOLS LIST =====
+     Normalise each tool to {"type":"function","function":{...}} format.
+     Serialisation happens later at output time (avoids Markup + str escaping bugs).
+-#}
+{%- set _has_tools = tools is defined and tools -%}
+{%- if _has_tools -%}
+  {%- set ns_tb = namespace(list=[]) -%}
+  {%- for tool in tools -%}
+    {%- if tool.function is defined -%}
+      {%- set ns_tb.list = ns_tb.list + [tool] -%}
+    {%- else -%}
+      {%- set ns_tb.list = ns_tb.list + [{"type": "function", "function": tool}] -%}
+    {%- endif -%}
+  {%- endfor -%}
+{%- endif -%}
+{#- ===== SECTION 6: OUTPUT SYSTEM TURN =====
+     Each fragment output via its own {{ }} block so tojson Markup objects are
+     never Python-concatenated with plain strings (would trigger HTML-escaping).
+     User system content appears BEFORE the tools block (correct ordering).
+     No default system prompt injected.
+-#}
+{%- if ns_sys.content or _has_tools -%}
+  {{- '<|im_start|>system\n' -}}
+  {%- if ns_sys.content -%}
+    {{- ns_sys.content -}}
+    {%- if _has_tools -%}{{- '\n\n' -}}{%- endif -%}
+  {%- endif -%}
+  {%- if _has_tools -%}
+    {{- '# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n' -}}
+    {%- for tool in ns_tb.list -%}
+      {{- tool | tojson -}}
+      {%- if not loop.last -%}{{- '\n' -}}{%- endif -%}
+    {%- endfor -%}
+    {{- '\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{"name": <function-name>, "arguments": <args-json-object>}\n</tool_call>' -}}
+  {%- endif -%}
+  {{- '<|im_end|>\n' -}}
+{%- endif -%}
+{#- ===== SECTION 7: MAIN MESSAGE LOOP ===== -#}
+{%- for message in messages -%}
+  {#- 7a: System / Developer — already rendered above, skip -#}
+  {%- if message.role == 'system' or message.role == 'developer' -%}
+  {#- 7b: User messages -#}
+  {%- elif message.role == 'user' -%}
+    {%- set _uc = render_content(message.content | default(''), true) -%}
+    {{- '<|im_start|>user\n' + _uc + '<|im_end|>\n' -}}
+  {#- 7c: Assistant messages -#}
+  {%- elif message.role == 'assistant' -%}
+    {#- Safely extract content as string — guard against absent key (BUG-002 fix).
+        Also support message.reasoning_content as an explicit think-block source
+        (used by some frameworks that store thinking separately from content). -#}
+    {%- if message.content is defined and message.content is string -%}
+      {%- set _ac = message.content -%}
+    {%- elif message.content is defined and message.content is iterable and message.content is not mapping -%}
+      {%- set _ac = render_content(message.content) -%}
+    {%- else -%}
+      {%- set _ac = '' -%}
+    {%- endif -%}
+    {#- Reconstruct content from reasoning_content + content when the framework
+        stores thinking separately (e.g. OpenAI-style reasoning_content field).
+        Only apply when no think-block already present in _ac. -#}
+    {%- if message.reasoning_content is defined and message.reasoning_content is string
+        and message.reasoning_content | trim
+        and '<think>' not in _ac -%}
+      {%- set _ac = '<think>\n' + message.reasoning_content | trim + '\n</think>\n\n' + _ac -%}
+    {%- endif -%}
+    {#- Collect tool_calls if present -#}
+    {%- set _tc = message.tool_calls if message.tool_calls is defined and message.tool_calls else [] -%}
+    {#- Strip <tool_call> prefix from content when tool_calls also present
+        (some frameworks duplicate the data in both fields) -#}
+    {%- if _tc and '<tool_call>' in _ac -%}
+      {%- set _ac = _ac.split('<tool_call>')[0] | trim -%}
+    {%- endif -%}
+    {#- Determine if this is the last-in-history assistant turn.
+        When add_generation_prompt=False and this is the last message, think blocks
+        must be preserved (and non-thinking prefill applied if needed).
+        All other turns have their think blocks stripped. -#}
+    {%- set _is_last_hist = loop.last and not (add_generation_prompt | default(false)) -%}
+    {#- Think-block handling (BUG-001 fix + last-turn preservation):
+        - Tool-call turns   : never strip (think block is part of the tool-call format)
+        - Last-history turn : preserve; inject non-thinking prefill when absent
+        - Historical turns  : strip using fuzzy end-tag matching to handle
+                              </think>, </thinking>, </ think>, </think > variants -#}
+    {%- if not _tc -%}
+      {%- if _is_last_hist -%}
+        {%- if '<think>' not in _ac and not ns.enable_thinking -%}
+          {%- set _ac = '<think>\n\n</think>\n\n' + _ac -%}
+        {%- endif -%}
+      {%- else -%}
+        {#- Fuzzy end-tag detection for historical turn stripping -#}
+        {%- set _think_end = '' -%}
+        {%- if '</think>' in _ac -%}
+          {%- set _think_end = '</think>' -%}
+        {%- elif '</thinking>' in _ac -%}
+          {%- set _think_end = '</thinking>' -%}
+        {%- elif '</ think>' in _ac -%}
+          {%- set _think_end = '</ think>' -%}
+        {%- elif '</think >' in _ac -%}
+          {%- set _think_end = '</think >' -%}
+        {%- endif -%}
+        {%- if _think_end -%}
+          {%- set _ac = _ac.split(_think_end)[-1].lstrip('\n') -%}
+        {%- endif -%}
+      {%- endif -%}
+    {%- endif -%}
+    {#- Emit the assistant turn -#}
+    {{- '<|im_start|>assistant\n' -}}
+    {%- if _ac -%}
+      {{- _ac -}}
+      {%- if _tc -%}{{- '\n' -}}{%- endif -%}
+    {%- endif -%}
+    {#- Render tool calls in Hermes format (BUG-006 fix: arguments as-is or tojson).
+        Each value output via its own {{ }} block — never concatenated with plain strings
+        in Python, which would trigger Markup HTML-escaping (BUG-003/markup fix). -#}
+    {%- if _tc -%}
+      {%- for tc in _tc -%}
+        {{- '<tool_call>\n' -}}
+        {{- '{"name": ' -}}{{- tc.function.name | tojson -}}
+        {%- if tc.function.arguments is string -%}
+          {{- ', "arguments": ' + tc.function.arguments -}}
+        {%- else -%}
+          {{- ', "arguments": ' -}}{{- tc.function.arguments | tojson -}}
+        {%- endif -%}
+        {{- '}' -}}
+        {%- if not loop.last -%}
+          {{- '\n</tool_call>\n' -}}
+        {%- else -%}
+          {{- '\n</tool_call>' -}}
+        {%- endif -%}
+      {%- endfor -%}
+    {%- endif -%}
+    {{- '<|im_end|>\n' -}}
+  {#- 7d: Tool results — group consecutive tool messages into one user turn -#}
+  {%- elif message.role == 'tool' -%}
+    {%- set _prev_role = messages[loop.index0 - 1].role if loop.index0 > 0 else '' -%}
+    {%- set _next_role = messages[loop.index0 + 1].role if not loop.last else '' -%}
+    {%- if _prev_role != 'tool' -%}
+      {{- '<|im_start|>user\n' -}}
+    {%- endif -%}
+    {{- '<tool_response>\n' -}}
+    {{- message.content | default('') -}}
+    {%- if _next_role == 'tool' -%}
+      {{- '\n</tool_response>\n' -}}
+    {%- else -%}
+      {{- '\n</tool_response>' -}}
+      {{- '<|im_end|>\n' -}}
+    {%- endif -%}
+  {#- 7e: Unknown role -#}
+  {%- else -%}
+    {{- raise_exception('Unexpected message role: ' + message.role) -}}
+  {%- endif -%}
+{%- endfor -%}
+{#- ===== SECTION 8: GENERATION PROMPT =====
+     enable_thinking=True  → open <think>\n prefill so llama.cpp reasoning-budget
+                             and other inference engines can hook into the think-stream.
+                             The model continues generating inside the open block.
+     enable_thinking=False → exact non-thinking prefill: <think>\n\n</think>\n\n
+                             (19-char closed block, BUG-005 fix)
+     NOTE: The <think>\n opener is EPHEMERAL — it lives only in the generation
+     prompt, never in chat history. Historical think-block stripping (BUG-001)
+     is handled in Section 7c and is entirely unaffected by this change.
+     No context poisoning risk.
+-#}
+{%- if add_generation_prompt -%}
+  {{- '<|im_start|>assistant\n' -}}
+  {%- if ns.enable_thinking -%}
+    {{- '<think>\n' -}}
+  {%- else -%}
+    {{- '<think>\n\n</think>\n\n' -}}
+  {%- endif -%}
+{%- endif -%}