joerowell commited on
Commit
62a5860
·
verified ·
1 Parent(s): 92f8b44

Enable thinking by default in non-Hopper FP8-KV serve command

Browse files

Adds --default-chat-template-kwargs '{"enable_thinking": true}' to the workaround command so it matches the main card's recommended invocation.

Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -106,6 +106,7 @@ The full vLLM recipe is on the main [Laguna XS.2 model card](https://huggingface
106
  > --reasoning-parser poolside_v1 \
107
  > --enable-auto-tool-choice \
108
  > --served-model-name laguna \
 
109
  > --kv-cache-dtype-skip-layers $(seq 0 39) \
110
  > --moe_backend marlin
111
  > ```
 
106
  > --reasoning-parser poolside_v1 \
107
  > --enable-auto-tool-choice \
108
  > --served-model-name laguna \
109
+ > --default-chat-template-kwargs '{"enable_thinking": true}' \
110
  > --kv-cache-dtype-skip-layers $(seq 0 39) \
111
  > --moe_backend marlin
112
  > ```