Update README.md
Browse files
README.md
CHANGED
|
@@ -174,10 +174,10 @@ curl http://localhost:8000/v1/chat/completions \
|
|
| 174 |
}'
|
| 175 |
```
|
| 176 |
|
| 177 |
-
### With
|
| 178 |
|
| 179 |
> [!WARNING]
|
| 180 |
-
> Due to the long generations produced by reasoning models, the lower latency provided by vLLM is preferred over
|
| 181 |
|
| 182 |
```python
|
| 183 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
| 174 |
}'
|
| 175 |
```
|
| 176 |
|
| 177 |
+
### With Hugging Face Transformers
|
| 178 |
|
| 179 |
> [!WARNING]
|
| 180 |
+
> Due to the long generations produced by reasoning models, the lower latency provided by vLLM is preferred over Hugging Face for evaluations and in production settings. We recommend Hugging Face generation primarily for quick debugging or testing.
|
| 181 |
|
| 182 |
```python
|
| 183 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|