fix: use --runner pooling instead of --task embed for vllm serve
Browse files
README.md
CHANGED
|
@@ -86,13 +86,7 @@ Start the embedding server once, then route from any process without reloading t
|
|
| 86 |
```bash
|
| 87 |
# Terminal 1: Start vLLM embedding server (runs once, stays alive)
|
| 88 |
uv pip install vllm
|
| 89 |
-
|
| 90 |
-
# vLLM >= 0.8
|
| 91 |
-
vllm serve Qwen/Qwen3-0.6B --task embed --port 8000
|
| 92 |
-
|
| 93 |
-
# vLLM < 0.8 (use this if the above fails)
|
| 94 |
-
python -m vllm.entrypoints.openai.api_server \
|
| 95 |
-
--model Qwen/Qwen3-0.6B --task embed --port 8000
|
| 96 |
```
|
| 97 |
|
| 98 |
```python
|
|
|
|
| 86 |
```bash
|
| 87 |
# Terminal 1: Start vLLM embedding server (runs once, stays alive)
|
| 88 |
uv pip install vllm
|
| 89 |
+
vllm serve Qwen/Qwen3-0.6B --runner pooling --port 8000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
```
|
| 91 |
|
| 92 |
```python
|