Update README.md
Browse files
README.md
CHANGED
|
@@ -20,13 +20,15 @@ It has several minor architectural differences from the original EAGLE: Drafter
|
|
| 20 |
### Model Sources [optional]
|
| 21 |
|
| 22 |
- **Repository:** [Dogacel/SpecDrift](https://github.com/Dogacel/SpecDrift)
|
| 23 |
-
- **Paper
|
| 24 |
|
| 25 |
## Uses
|
| 26 |
|
| 27 |
We recommend using SGLang to run the model,
|
| 28 |
|
| 29 |
```
|
|
|
|
|
|
|
| 30 |
python -m sglang.launch_server \
|
| 31 |
--model-path openai/gpt-oss-20b \
|
| 32 |
--speculative-algorithm EAGLE3 \
|
|
@@ -34,13 +36,14 @@ python -m sglang.launch_server \
|
|
| 34 |
--speculative-num-steps 3 \
|
| 35 |
--speculative-eagle-topk 1 \
|
| 36 |
--speculative-num-draft-tokens 4 \
|
|
|
|
| 37 |
--port 30000 \
|
| 38 |
--dp-size 1 --tp-size 1 \
|
| 39 |
--max-running-requests 64 \
|
| 40 |
--cuda-graph-max-bs 64 \
|
| 41 |
--attention-backend fa3 \
|
| 42 |
--trust-remote-code \
|
| 43 |
-
--mem-fraction-static 0.
|
| 44 |
```
|
| 45 |
|
| 46 |
## Training Details
|
|
@@ -103,7 +106,17 @@ Our evaluation on higher batch sizes has shown the model performance matches or
|
|
| 103 |
|
| 104 |
**BibTeX:**
|
| 105 |
|
| 106 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
## Acknowledgements
|
| 109 |
|
|
|
|
| 20 |
### Model Sources [optional]
|
| 21 |
|
| 22 |
- **Repository:** [Dogacel/SpecDrift](https://github.com/Dogacel/SpecDrift)
|
| 23 |
+
- **Paper:** https://arxiv.org/abs/2605.09992
|
| 24 |
|
| 25 |
## Uses
|
| 26 |
|
| 27 |
We recommend using SGLang to run the model,
|
| 28 |
|
| 29 |
```
|
| 30 |
+
export SGLANG_ENABLE_SPEC_V2=1
|
| 31 |
+
|
| 32 |
python -m sglang.launch_server \
|
| 33 |
--model-path openai/gpt-oss-20b \
|
| 34 |
--speculative-algorithm EAGLE3 \
|
|
|
|
| 36 |
--speculative-num-steps 3 \
|
| 37 |
--speculative-eagle-topk 1 \
|
| 38 |
--speculative-num-draft-tokens 4 \
|
| 39 |
+
--speculative-draft-sliding-window 2048 \
|
| 40 |
--port 30000 \
|
| 41 |
--dp-size 1 --tp-size 1 \
|
| 42 |
--max-running-requests 64 \
|
| 43 |
--cuda-graph-max-bs 64 \
|
| 44 |
--attention-backend fa3 \
|
| 45 |
--trust-remote-code \
|
| 46 |
+
--mem-fraction-static 0.9 --dtype bfloat16
|
| 47 |
```
|
| 48 |
|
| 49 |
## Training Details
|
|
|
|
| 106 |
|
| 107 |
**BibTeX:**
|
| 108 |
|
| 109 |
+
```bibtex
|
| 110 |
+
@misc{eldenk2026attentiondrift,
|
| 111 |
+
title={Attention Drift: What Autoregressive Speculative Decoding Models Learn},
|
| 112 |
+
author={Doğaç Eldenk and Payal Mohapatra and Yigitcan Comlek and Kaan Oktay and Hongyang Zhang and Stephen Xia},
|
| 113 |
+
year={2026},
|
| 114 |
+
eprint={2605.09992},
|
| 115 |
+
archivePrefix={arXiv},
|
| 116 |
+
primaryClass={cs.LG},
|
| 117 |
+
url={https://arxiv.org/abs/2605.09992},
|
| 118 |
+
}
|
| 119 |
+
```
|
| 120 |
|
| 121 |
## Acknowledgements
|
| 122 |
|