Update README.md
Browse files
README.md
CHANGED
|
@@ -12,19 +12,36 @@ base_model:
|
|
| 12 |
|
| 13 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
|
| 15 |
-
EAGLE-3 drafter model for GPT-oss-20b.
|
|
|
|
| 16 |
|
| 17 |
## Model Details
|
| 18 |
|
| 19 |
### Model Sources [optional]
|
| 20 |
|
| 21 |
-
- **Repository:**
|
| 22 |
- **Paper [optional]:** TODO
|
| 23 |
-
- **Demo [optional]:** TODO
|
| 24 |
|
| 25 |
## Uses
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
## Training Details
|
| 30 |
|
|
@@ -32,20 +49,53 @@ TODO: Quick SGLang starter
|
|
| 32 |
|
| 33 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
| 36 |
|
| 37 |
### Training Procedure
|
| 38 |
|
| 39 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 40 |
|
| 41 |
-
SpecForge
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
TODO: Fill training parameters
|
| 44 |
|
| 45 |
## Evaluation
|
| 46 |
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
|
|
|
| 49 |
|
| 50 |
## Citation
|
| 51 |
|
|
|
|
| 12 |
|
| 13 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
|
| 15 |
+
EAGLE-3 drafter model for GPT-oss-20b. This model is released as a part of _Attention Drift: What Speculative Decoding Models Learn_ paper.
|
| 16 |
+
It has several minor architectural differences from the original EAGLE: Drafter hidden state is captured *after* the norm, additional norm injected before FC.
|
| 17 |
|
| 18 |
## Model Details
|
| 19 |
|
| 20 |
### Model Sources [optional]
|
| 21 |
|
| 22 |
+
- **Repository:** [Dogacel/SpecDrift](https://github.com/Dogacel/SpecDrift)
|
| 23 |
- **Paper [optional]:** TODO
|
|
|
|
| 24 |
|
| 25 |
## Uses
|
| 26 |
|
| 27 |
+
We recommend using SGLang to run the model,
|
| 28 |
+
|
| 29 |
+
```
|
| 30 |
+
python -m sglang.launch_server \
|
| 31 |
+
--model-path openai/gpt-oss-20b \
|
| 32 |
+
--speculative-algorithm EAGLE3 \
|
| 33 |
+
--speculative-draft-model-path "Dogacel/specdrift-gpt-oss-20b-eagle3" \
|
| 34 |
+
--speculative-num-steps 3 \
|
| 35 |
+
--speculative-eagle-topk 1 \
|
| 36 |
+
--speculative-num-draft-tokens 4 \
|
| 37 |
+
--port 30000 \
|
| 38 |
+
--dp-size 1 --tp-size 1 \
|
| 39 |
+
--max-running-requests 64 \
|
| 40 |
+
--cuda-graph-max-bs 64 \
|
| 41 |
+
--attention-backend fa3 \
|
| 42 |
+
--trust-remote-code \
|
| 43 |
+
--mem-fraction-static 0.5 --dtype bfloat16
|
| 44 |
+
```
|
| 45 |
|
| 46 |
## Training Details
|
| 47 |
|
|
|
|
| 49 |
|
| 50 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 51 |
|
| 52 |
+
This model is trained on Nemoron Post Training V2 dataset, answers regenerated using gpt-oss-20b.
|
| 53 |
+
|
| 54 |
+
Dataset publicly available at: https://huggingface.co/datasets/Dogacel/nemotron-post-training-v2-gpt-oss-20b-regen
|
| 55 |
|
| 56 |
### Training Procedure
|
| 57 |
|
| 58 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 59 |
|
| 60 |
+
We've trained our model using [SpecForge](https://github.com/sgl-project/SpecForge) on 8xH200 within 8 hours.
|
| 61 |
+
|
| 62 |
+
- **LR:** 1e-4 (warmup 0.2, cosine)
|
| 63 |
+
- **Epochs:** 2
|
| 64 |
+
- **Batch Size:** 4 (Effective 4x8=32)
|
| 65 |
+
- **Max Length:** 4096
|
| 66 |
+
- **TTT:** 4
|
| 67 |
|
| 68 |
TODO: Fill training parameters
|
| 69 |
|
| 70 |
## Evaluation
|
| 71 |
|
| 72 |
+
Evaluation has run on: MT-Bench, 80 prompts, max tokens 2048, temperature 0.7
|
| 73 |
+
|
| 74 |
+
Scripts available at [SpecForge](https://github.com/sgl-project/SpecForge/pull/552).
|
| 75 |
+
|
| 76 |
+
### H100 @ BS=1 — Baseline vs Ours (1-3-1-4)
|
| 77 |
+
|
| 78 |
+
| Metric | Baseline | Ours (1-3-1-4) | Δ |
|
| 79 |
+
|---|---:|---:|---:|
|
| 80 |
+
| **Latency (s)** | 444.05 | **373.11** | −16.0% |
|
| 81 |
+
| **Throughput (tok/s)** | 304.93 | **371.90** | +22.0% |
|
| 82 |
+
| **Accept Length** | 1.000 | **2.347** | +134.7% |
|
| 83 |
+
|
| 84 |
+
### Per-Category Throughput (H100, BS=1)
|
| 85 |
+
|
| 86 |
+
| Category | Baseline → Ours | Δ | Accept Length |
|
| 87 |
+
|---|---:|---:|---:|
|
| 88 |
+
| Writing | 207.83 → 268.62 | +29.2% | 2.225 |
|
| 89 |
+
| Roleplay | 301.01 → 380.61 | +26.4% | 2.210 |
|
| 90 |
+
| Reasoning | 260.19 → 265.83 | +2.2% | 2.334 |
|
| 91 |
+
| Math | 170.41 → 190.53 | +11.8% | **2.894** |
|
| 92 |
+
| Coding | 427.36 → 487.45 | +14.1% | 2.672 |
|
| 93 |
+
| Extraction | 164.69 → 233.76 | **+41.9%** | 2.634 |
|
| 94 |
+
| STEM | 436.35 → 545.97 | +25.1% | 2.287 |
|
| 95 |
+
| Humanities | 471.61 → 602.40 | +27.7% | 2.112 |
|
| 96 |
+
|
| 97 |
|
| 98 |
+
Our evaluation on higher batch sizes has shown the model performance matches or slightly exceeds the baseline.
|
| 99 |
|
| 100 |
## Citation
|
| 101 |
|