Dogacel
/

specdrift-gpt-oss-20b-eagle3

Safetensors

English

llama

Model card Files Files and versions

xet

Community

Dogacel commited on 3 days ago

Commit

4ddcb73

verified ·

1 Parent(s): 623072e

Update README.md

Browse files

Files changed (1) hide show

README.md +57 -7

README.md CHANGED Viewed

@@ -12,19 +12,36 @@ base_model:
 <!-- Provide a quick summary of what the model is/does. -->
-EAGLE-3 drafter model for GPT-oss-20b.
 ## Model Details
 ### Model Sources [optional]
-- **Repository:** TODO
 - **Paper [optional]:** TODO
-- **Demo [optional]:** TODO
 ## Uses
-TODO: Quick SGLang starter
 ## Training Details
@@ -32,20 +49,53 @@ TODO: Quick SGLang starter
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-https://huggingface.co/datasets/Dogacel/nemotron-post-training-v2-gpt-oss-20b-regen
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-SpecForge,
 TODO: Fill training parameters
 ## Evaluation
-TODO: Results
 ## Citation

 <!-- Provide a quick summary of what the model is/does. -->
+EAGLE-3 drafter model for GPT-oss-20b. This model is released as a part of _Attention Drift: What Speculative Decoding Models Learn_ paper.
+It has several minor architectural differences from the original EAGLE: Drafter hidden state is captured *after* the norm, additional norm injected before FC.
 ## Model Details
 ### Model Sources [optional]
+- **Repository:** [Dogacel/SpecDrift](https://github.com/Dogacel/SpecDrift)
 - **Paper [optional]:** TODO
 ## Uses
+We recommend using SGLang to run the model,
+```
+python -m sglang.launch_server \
+    --model-path openai/gpt-oss-20b \
+    --speculative-algorithm EAGLE3 \
+    --speculative-draft-model-path "Dogacel/specdrift-gpt-oss-20b-eagle3" \
+    --speculative-num-steps 3 \
+    --speculative-eagle-topk 1 \
+    --speculative-num-draft-tokens 4 \
+    --port 30000 \
+    --dp-size 1 --tp-size 1 \
+    --max-running-requests 64 \
+    --cuda-graph-max-bs 64 \
+    --attention-backend fa3 \
+    --trust-remote-code \
+    --mem-fraction-static 0.5 --dtype bfloat16
+```
 ## Training Details
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+This model is trained on Nemoron Post Training V2 dataset, answers regenerated using gpt-oss-20b.
+Dataset publicly available at: https://huggingface.co/datasets/Dogacel/nemotron-post-training-v2-gpt-oss-20b-regen
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+We've trained our model using [SpecForge](https://github.com/sgl-project/SpecForge) on 8xH200 within 8 hours.
+- **LR:** 1e-4 (warmup 0.2, cosine)
+- **Epochs:** 2
+- **Batch Size:** 4 (Effective 4x8=32)
+- **Max Length:** 4096
+- **TTT:** 4
 TODO: Fill training parameters
 ## Evaluation
+Evaluation has run on: MT-Bench, 80 prompts, max tokens 2048, temperature 0.7
+Scripts available at [SpecForge](https://github.com/sgl-project/SpecForge/pull/552).
+### H100 @ BS=1 — Baseline vs Ours (1-3-1-4)
+| Metric | Baseline | Ours (1-3-1-4) | Δ |
+|---|---:|---:|---:|
+| **Latency (s)** | 444.05 | **373.11** | −16.0% |
+| **Throughput (tok/s)** | 304.93 | **371.90** | +22.0% |
+| **Accept Length** | 1.000 | **2.347** | +134.7% |
+### Per-Category Throughput (H100, BS=1)
+| Category | Baseline → Ours | Δ | Accept Length |
+|---|---:|---:|---:|
+| Writing | 207.83 → 268.62 | +29.2% | 2.225 |
+| Roleplay | 301.01 → 380.61 | +26.4% | 2.210 |
+| Reasoning | 260.19 → 265.83 | +2.2% | 2.334 |
+| Math | 170.41 → 190.53 | +11.8% | **2.894** |
+| Coding | 427.36 → 487.45 | +14.1% | 2.672 |
+| Extraction | 164.69 → 233.76 | **+41.9%** | 2.634 |
+| STEM | 436.35 → 545.97 | +25.1% | 2.287 |
+| Humanities | 471.61 → 602.40 | +27.7% | 2.112 |
+Our evaluation on higher batch sizes has shown the model performance matches or slightly exceeds the baseline.
 ## Citation