Model Card for Model ID

EAGLE-3 drafter model for GPT-oss-120b. This model is released as a part of Attention Drift: What Speculative Decoding Models Learn paper. It has several minor architectural differences from the original EAGLE: Drafter hidden state is captured after the norm, additional norm injected before FC.

Model Details

Model Sources [optional]

Repository: Dogacel/SpecDrift
Paper: https://arxiv.org/abs/2605.09992

Uses

We recommend using SGLang to run the model,

export SGLANG_ENABLE_SPEC_V2=1

python -m sglang.launch_server \
    --model-path openai/gpt-oss-120b \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path "Dogacel/specdrift-gpt-oss-120b-eagle3" \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4 \
    --speculative-draft-sliding-window 2048 \
    --port 30000 \
    --dp-size 1 --tp-size 1 \
    --max-running-requests 64 \
    --cuda-graph-max-bs 64 \
    --attention-backend fa3 \
    --trust-remote-code \
    --mem-fraction-static 0.95 --dtype bfloat16

Training Details

Training Data

This model is trained on Nemoron Post Training V2 dataset, answers regenerated using gpt-oss-120b.

Dataset publicly available at: https://huggingface.co/datasets/Dogacel/nemotron-post-training-v2-gpt-oss-120b-regen

Training Procedure

We've trained our model using SpecForge on 8xH200 within 10 hours.

LR: 1e-4 (warmup 0.2, cosine)
Epochs: 2
Batch Size: 2 (Effective 2x8=16)
Max Length: 4096
TTT: 4

Evaluation

Evaluation has run on: MT-Bench, 80 prompts, max tokens 2048, temperature 0.7

Scripts available at SpecForge.

gpt-oss-120b · EAGLE3 (1-3-1-4) on H100

Throughput (AL) and Δ vs Baseline

BS	Baseline	D-Flash	Ours (1-3-1-4)
1	212	231 (2.47) \| +8.5%	285 (2.41) \| +34.1%
8	795	905 (2.47) \| +13.9%	1044 (2.41) \| +31.4%
64	1620	1730 (2.42) \| +6.8%	2339 (2.43) \| +44.4%

Throughput in tok/s.

Per-Category Throughput

BS=1

Category	Baseline	Ours	AL	Δ
Writing	178.55	214.42	2.340	+20.1%
Roleplay	192.61	296.90	2.270	+54.1%
Reasoning	130.13	168.49	2.393	+29.5%
Math	84.26	115.32	3.101	+36.9%
Coding	329.93	463.48	2.781	+40.5%
Extraction	98.13	123.12	2.611	+25.5%
STEM	335.91	445.40	2.339	+32.6%
Humanities	349.63	451.13	2.131	+29.0%

BS=8

Category	Baseline	Ours	AL	Δ
Writing	513.57	734.62	2.272	+43.0%
Roleplay	856.44	1149.83	2.250	+34.3%
Reasoning	481.93	679.52	2.370	+41.0%
Math	362.71	443.77	3.160	+22.4%
Coding	1278.03	1646.52	2.765	+28.8%
Extraction	388.07	486.76	2.658	+25.4%
STEM	1240.15	1613.25	2.325	+30.1%
Humanities	1238.40	1600.98	2.172	+29.3%

BS=64

Category	Baseline	Ours	AL	Δ
Writing	1251.72	1855.41	2.401	+48.2%
Roleplay	1477.10	2389.35	2.238	+61.8%
Reasoning	947.08	1488.27	2.380	+57.1%
Math	641.84	997.58	3.095	+55.4%
Coding	2591.83	3607.85	2.803	+39.2%
Extraction	784.31	1139.90	2.733	+45.3%
STEM	2634.67	3753.09	2.315	+42.5%
Humanities	2630.41	3479.54	2.193	+32.3%

Citation

BibTeX:

@misc{eldenk2026attentiondrift,
      title={Attention Drift: What Autoregressive Speculative Decoding Models Learn}, 
      author={Doğaç Eldenk and Payal Mohapatra and Yigitcan Comlek and Kaan Oktay and Hongyang Zhang and Stephen Xia},
      year={2026},
      eprint={2605.09992},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.09992}, 
}

Acknowledgements

We would like to thank fal and Lambda for their support.

Downloads last month: 106

Safetensors

Model size

0.4B params

Tensor type

I64

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dogacel/specdrift-gpt-oss-120b-eagle3

Base model

openai/gpt-oss-120b

Finetuned

(106)

this model

Dataset used to train Dogacel/specdrift-gpt-oss-120b-eagle3

Collection including Dogacel/specdrift-gpt-oss-120b-eagle3

SpecDrift

Collection

Models released as a part of Attention-Drift Paper, trained for deployment on production • 2 items • Updated 3 days ago

Paper for Dogacel/specdrift-gpt-oss-120b-eagle3

Attention Drift: What Autoregressive Speculative Decoding Models Learn

Paper • 2605.09992 • Published 2 days ago