Model Card for Model ID

EAGLE-3 drafter model for GPT-oss-20b. This model is released as a part of Attention Drift: What Speculative Decoding Models Learn paper. It has several minor architectural differences from the original EAGLE: Drafter hidden state is captured after the norm, additional norm injected before FC.

Model Details

Model Sources [optional]

Repository: Dogacel/SpecDrift
Paper: https://arxiv.org/abs/2605.09992

Uses

We recommend using SGLang to run the model,

export SGLANG_ENABLE_SPEC_V2=1

python -m sglang.launch_server \
    --model-path openai/gpt-oss-20b \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path "Dogacel/specdrift-gpt-oss-20b-eagle3" \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4 \
    --speculative-draft-sliding-window 2048 \
    --port 30000 \
    --dp-size 1 --tp-size 1 \
    --max-running-requests 64 \
    --cuda-graph-max-bs 64 \
    --attention-backend fa3 \
    --trust-remote-code \
    --mem-fraction-static 0.9 --dtype bfloat16

Training Details

Training Data

This model is trained on Nemoron Post Training V2 dataset, answers regenerated using gpt-oss-20b.

Dataset publicly available at: https://huggingface.co/datasets/Dogacel/nemotron-post-training-v2-gpt-oss-20b-regen

Training Procedure

We've trained our model using SpecForge on 8xH200 within 8 hours.

LR: 1e-4 (warmup 0.2, cosine)
Epochs: 2
Batch Size: 4 (Effective 4x8=32)
Max Length: 4096
TTT: 4

TODO: Fill training parameters

Evaluation

Evaluation has run on: MT-Bench, 80 prompts, max tokens 2048, temperature 0.7

Scripts available at SpecForge.

H100 @ BS=1 — Baseline vs Ours (1-3-1-4)

Metric	Baseline	Ours (1-3-1-4)	Δ
Latency (s)	444.05	373.11	−16.0%
Throughput (tok/s)	304.93	371.90	+22.0%
Accept Length	1.000	2.347	+134.7%

Per-Category Throughput (H100, BS=1)

Category	Baseline → Ours	Δ	Accept Length
Writing	207.83 → 268.62	+29.2%	2.225
Roleplay	301.01 → 380.61	+26.4%	2.210
Reasoning	260.19 → 265.83	+2.2%	2.334
Math	170.41 → 190.53	+11.8%	2.894
Coding	427.36 → 487.45	+14.1%	2.672
Extraction	164.69 → 233.76	+41.9%	2.634
STEM	436.35 → 545.97	+25.1%	2.287
Humanities	471.61 → 602.40	+27.7%	2.112

Our evaluation on higher batch sizes has shown the model performance matches or slightly exceeds the baseline.

Citation

BibTeX:

@misc{eldenk2026attentiondrift,
      title={Attention Drift: What Autoregressive Speculative Decoding Models Learn}, 
      author={Doğaç Eldenk and Payal Mohapatra and Yigitcan Comlek and Kaan Oktay and Hongyang Zhang and Stephen Xia},
      year={2026},
      eprint={2605.09992},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.09992}, 
}

Acknowledgements

We would like to thank fal and Lambda for their support.

Downloads last month: 43

Safetensors

Model size

0.4B params

Tensor type

I64

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dogacel/specdrift-gpt-oss-20b-eagle3

Base model

openai/gpt-oss-20b

Finetuned

(510)

this model

Dataset used to train Dogacel/specdrift-gpt-oss-20b-eagle3

Collection including Dogacel/specdrift-gpt-oss-20b-eagle3

SpecDrift

Collection

Models released as a part of Attention-Drift Paper, trained for deployment on production • 2 items • Updated 3 days ago

Paper for Dogacel/specdrift-gpt-oss-20b-eagle3

Attention Drift: What Autoregressive Speculative Decoding Models Learn

Paper • 2605.09992 • Published 2 days ago