Model Card for Model ID
EAGLE-3 drafter model for GPT-oss-20b. This model is released as a part of Attention Drift: What Speculative Decoding Models Learn paper. It has several minor architectural differences from the original EAGLE: Drafter hidden state is captured after the norm, additional norm injected before FC.
Model Details
Model Sources [optional]
- Repository: Dogacel/SpecDrift
- Paper: https://arxiv.org/abs/2605.09992
Uses
We recommend using SGLang to run the model,
export SGLANG_ENABLE_SPEC_V2=1
python -m sglang.launch_server \
--model-path openai/gpt-oss-20b \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path "Dogacel/specdrift-gpt-oss-20b-eagle3" \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--speculative-draft-sliding-window 2048 \
--port 30000 \
--dp-size 1 --tp-size 1 \
--max-running-requests 64 \
--cuda-graph-max-bs 64 \
--attention-backend fa3 \
--trust-remote-code \
--mem-fraction-static 0.9 --dtype bfloat16
Training Details
Training Data
This model is trained on Nemoron Post Training V2 dataset, answers regenerated using gpt-oss-20b.
Dataset publicly available at: https://huggingface.co/datasets/Dogacel/nemotron-post-training-v2-gpt-oss-20b-regen
Training Procedure
We've trained our model using SpecForge on 8xH200 within 8 hours.
- LR: 1e-4 (warmup 0.2, cosine)
- Epochs: 2
- Batch Size: 4 (Effective 4x8=32)
- Max Length: 4096
- TTT: 4
TODO: Fill training parameters
Evaluation
Evaluation has run on: MT-Bench, 80 prompts, max tokens 2048, temperature 0.7
Scripts available at SpecForge.
H100 @ BS=1 — Baseline vs Ours (1-3-1-4)
| Metric | Baseline | Ours (1-3-1-4) | Δ |
|---|---|---|---|
| Latency (s) | 444.05 | 373.11 | −16.0% |
| Throughput (tok/s) | 304.93 | 371.90 | +22.0% |
| Accept Length | 1.000 | 2.347 | +134.7% |
Per-Category Throughput (H100, BS=1)
| Category | Baseline → Ours | Δ | Accept Length |
|---|---|---|---|
| Writing | 207.83 → 268.62 | +29.2% | 2.225 |
| Roleplay | 301.01 → 380.61 | +26.4% | 2.210 |
| Reasoning | 260.19 → 265.83 | +2.2% | 2.334 |
| Math | 170.41 → 190.53 | +11.8% | 2.894 |
| Coding | 427.36 → 487.45 | +14.1% | 2.672 |
| Extraction | 164.69 → 233.76 | +41.9% | 2.634 |
| STEM | 436.35 → 545.97 | +25.1% | 2.287 |
| Humanities | 471.61 → 602.40 | +27.7% | 2.112 |
Our evaluation on higher batch sizes has shown the model performance matches or slightly exceeds the baseline.
Citation
BibTeX:
@misc{eldenk2026attentiondrift,
title={Attention Drift: What Autoregressive Speculative Decoding Models Learn},
author={Doğaç Eldenk and Payal Mohapatra and Yigitcan Comlek and Kaan Oktay and Hongyang Zhang and Stephen Xia},
year={2026},
eprint={2605.09992},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2605.09992},
}
Acknowledgements
We would like to thank fal and Lambda for their support.
- Downloads last month
- 43
Model tree for Dogacel/specdrift-gpt-oss-20b-eagle3
Base model
openai/gpt-oss-20b