Safetensors
English
llama
File size: 3,983 Bytes
623072e
 
 
 
 
 
 
 
 
 
 
 
 
 
4ddcb73
 
623072e
 
 
 
 
4ddcb73
dad2340
623072e
 
 
4ddcb73
 
 
dad2340
 
4ddcb73
 
 
 
 
 
 
dad2340
4ddcb73
 
 
 
 
 
dad2340
4ddcb73
623072e
 
 
 
 
 
 
4ddcb73
 
 
623072e
 
 
 
 
4ddcb73
 
 
 
 
 
 
623072e
 
 
 
 
4ddcb73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
623072e
4ddcb73
623072e
 
 
 
 
 
 
dad2340
 
 
 
 
 
 
 
 
 
 
623072e
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
license: cc-by-4.0
datasets:
- Dogacel/nemotron-post-training-v2-gpt-oss-20b-regen
language:
- en
base_model:
- openai/gpt-oss-20b
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

EAGLE-3 drafter model for GPT-oss-20b. This model is released as a part of _Attention Drift: What Speculative Decoding Models Learn_ paper. 
It has several minor architectural differences from the original EAGLE: Drafter hidden state is captured *after* the norm, additional norm injected before FC.

## Model Details

### Model Sources [optional]

- **Repository:** [Dogacel/SpecDrift](https://github.com/Dogacel/SpecDrift)
- **Paper:** https://arxiv.org/abs/2605.09992

## Uses

We recommend using SGLang to run the model,

```
export SGLANG_ENABLE_SPEC_V2=1

python -m sglang.launch_server \
    --model-path openai/gpt-oss-20b \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path "Dogacel/specdrift-gpt-oss-20b-eagle3" \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4 \
    --speculative-draft-sliding-window 2048 \
    --port 30000 \
    --dp-size 1 --tp-size 1 \
    --max-running-requests 64 \
    --cuda-graph-max-bs 64 \
    --attention-backend fa3 \
    --trust-remote-code \
    --mem-fraction-static 0.9 --dtype bfloat16
```

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

This model is trained on Nemoron Post Training V2 dataset, answers regenerated using gpt-oss-20b.

Dataset publicly available at: https://huggingface.co/datasets/Dogacel/nemotron-post-training-v2-gpt-oss-20b-regen

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

We've trained our model using [SpecForge](https://github.com/sgl-project/SpecForge) on 8xH200 within 8 hours.

- **LR:** 1e-4 (warmup 0.2, cosine)
- **Epochs:** 2
- **Batch Size:** 4 (Effective 4x8=32)
- **Max Length:** 4096
- **TTT:** 4

TODO: Fill training parameters

## Evaluation

Evaluation has run on: MT-Bench, 80 prompts, max tokens 2048, temperature 0.7

Scripts available at [SpecForge](https://github.com/sgl-project/SpecForge/pull/552).

### H100 @ BS=1 — Baseline vs Ours (1-3-1-4)

| Metric | Baseline | Ours (1-3-1-4) | Δ |
|---|---:|---:|---:|
| **Latency (s)** | 444.05 | **373.11** | −16.0% |
| **Throughput (tok/s)** | 304.93 | **371.90** | +22.0% |
| **Accept Length** | 1.000 | **2.347** | +134.7% |

### Per-Category Throughput (H100, BS=1)

| Category | Baseline → Ours | Δ | Accept Length |
|---|---:|---:|---:|
| Writing | 207.83 → 268.62 | +29.2% | 2.225 |
| Roleplay | 301.01 → 380.61 | +26.4% | 2.210 |
| Reasoning | 260.19 → 265.83 | +2.2% | 2.334 |
| Math | 170.41 → 190.53 | +11.8% | **2.894** |
| Coding | 427.36 → 487.45 | +14.1% | 2.672 |
| Extraction | 164.69 → 233.76 | **+41.9%** | 2.634 |
| STEM | 436.35 → 545.97 | +25.1% | 2.287 |
| Humanities | 471.61 → 602.40 | +27.7% | 2.112 |


Our evaluation on higher batch sizes has shown the model performance matches or slightly exceeds the baseline.

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

```bibtex
@misc{eldenk2026attentiondrift,
      title={Attention Drift: What Autoregressive Speculative Decoding Models Learn}, 
      author={Doğaç Eldenk and Payal Mohapatra and Yigitcan Comlek and Kaan Oktay and Hongyang Zhang and Stephen Xia},
      year={2026},
      eprint={2605.09992},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.09992}, 
}
```

## Acknowledgements

We would like to thank fal and Lambda for their support.