Enable thinking by default in non-Hopper FP8-KV serve command
Browse filesAdds --default-chat-template-kwargs '{"enable_thinking": true}' to the workaround command so it matches the main card's recommended invocation.
README.md
CHANGED
|
@@ -106,6 +106,7 @@ The full vLLM recipe is on the main [Laguna XS.2 model card](https://huggingface
|
|
| 106 |
> --reasoning-parser poolside_v1 \
|
| 107 |
> --enable-auto-tool-choice \
|
| 108 |
> --served-model-name laguna \
|
|
|
|
| 109 |
> --kv-cache-dtype-skip-layers $(seq 0 39) \
|
| 110 |
> --moe_backend marlin
|
| 111 |
> ```
|
|
|
|
| 106 |
> --reasoning-parser poolside_v1 \
|
| 107 |
> --enable-auto-tool-choice \
|
| 108 |
> --served-model-name laguna \
|
| 109 |
+
> --default-chat-template-kwargs '{"enable_thinking": true}' \
|
| 110 |
> --kv-cache-dtype-skip-layers $(seq 0 39) \
|
| 111 |
> --moe_backend marlin
|
| 112 |
> ```
|