MeganEFlynn commited on
Commit
6aa2d27
·
verified ·
1 Parent(s): a7fdd60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -34,7 +34,7 @@ It can also be deployed with a quantized verifier for even better speedups:
34
 
35
  ```bash
36
  vllm serve RedHatAI/gemma-4-31B-it-FP8-block --tensor-parallel-size 2 --attention-backend FLASH_ATTN --speculative-config '{
37
- "model": "inference-optimization/Dflash-gemma4-spec",
38
  "num_speculative_tokens": 8,
39
  "method": "dflash"
40
  }'
 
34
 
35
  ```bash
36
  vllm serve RedHatAI/gemma-4-31B-it-FP8-block --tensor-parallel-size 2 --attention-backend FLASH_ATTN --speculative-config '{
37
+ "model": "RedHatAI/gemma-4-31B-it-speculator.dflash",
38
  "num_speculative_tokens": 8,
39
  "method": "dflash"
40
  }'