Update README.md
Browse files
README.md
CHANGED
|
@@ -34,7 +34,7 @@ It can also be deployed with a quantized verifier for even better speedups:
|
|
| 34 |
|
| 35 |
```bash
|
| 36 |
vllm serve RedHatAI/gemma-4-31B-it-FP8-block --tensor-parallel-size 2 --attention-backend FLASH_ATTN --speculative-config '{
|
| 37 |
-
"model": "
|
| 38 |
"num_speculative_tokens": 8,
|
| 39 |
"method": "dflash"
|
| 40 |
}'
|
|
|
|
| 34 |
|
| 35 |
```bash
|
| 36 |
vllm serve RedHatAI/gemma-4-31B-it-FP8-block --tensor-parallel-size 2 --attention-backend FLASH_ATTN --speculative-config '{
|
| 37 |
+
"model": "RedHatAI/gemma-4-31B-it-speculator.dflash",
|
| 38 |
"num_speculative_tokens": 8,
|
| 39 |
"method": "dflash"
|
| 40 |
}'
|