JiaqiXue commited on
Commit
82903a0
·
verified ·
1 Parent(s): 8c269dd

fix: use --runner pooling instead of --task embed for vllm serve

Browse files
Files changed (1) hide show
  1. README.md +1 -7
README.md CHANGED
@@ -86,13 +86,7 @@ Start the embedding server once, then route from any process without reloading t
86
  ```bash
87
  # Terminal 1: Start vLLM embedding server (runs once, stays alive)
88
  uv pip install vllm
89
-
90
- # vLLM >= 0.8
91
- vllm serve Qwen/Qwen3-0.6B --task embed --port 8000
92
-
93
- # vLLM < 0.8 (use this if the above fails)
94
- python -m vllm.entrypoints.openai.api_server \
95
- --model Qwen/Qwen3-0.6B --task embed --port 8000
96
  ```
97
 
98
  ```python
 
86
  ```bash
87
  # Terminal 1: Start vLLM embedding server (runs once, stays alive)
88
  uv pip install vllm
89
+ vllm serve Qwen/Qwen3-0.6B --runner pooling --port 8000
 
 
 
 
 
 
90
  ```
91
 
92
  ```python