amazon
/

GDN-primed-HQwen3-8B-Reasoner

Text Generation

state-space-model

linear-attention

Model card Files Files and versions

zancato commited on 20 days ago

Commit

62a72fc

·

verified ·

1 Parent(s): 1d32fa5

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -174,10 +174,10 @@ curl http://localhost:8000/v1/chat/completions \
   }'
 ```
-### With HuggingFace Transformers
 > [!WARNING]
-> Due to the long generations produced by reasoning models, the lower latency provided by vLLM is preferred over Huggingface for evaluations and in production settings.  We recommend Huggingface generation primarily for quick debugging or testing.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer

   }'
 ```
+### With Hugging Face Transformers
 > [!WARNING]
+> Due to the long generations produced by reasoning models, the lower latency provided by vLLM is preferred over Hugging Face for evaluations and in production settings.  We recommend Hugging Face generation primarily for quick debugging or testing.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer