Safetensors
English
llama
Dogacel commited on
Commit
4ddcb73
·
verified ·
1 Parent(s): 623072e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -7
README.md CHANGED
@@ -12,19 +12,36 @@ base_model:
12
 
13
  <!-- Provide a quick summary of what the model is/does. -->
14
 
15
- EAGLE-3 drafter model for GPT-oss-20b.
 
16
 
17
  ## Model Details
18
 
19
  ### Model Sources [optional]
20
 
21
- - **Repository:** TODO
22
  - **Paper [optional]:** TODO
23
- - **Demo [optional]:** TODO
24
 
25
  ## Uses
26
 
27
- TODO: Quick SGLang starter
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## Training Details
30
 
@@ -32,20 +49,53 @@ TODO: Quick SGLang starter
32
 
33
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
34
 
35
- https://huggingface.co/datasets/Dogacel/nemotron-post-training-v2-gpt-oss-20b-regen
 
 
36
 
37
  ### Training Procedure
38
 
39
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
40
 
41
- SpecForge,
 
 
 
 
 
 
42
 
43
  TODO: Fill training parameters
44
 
45
  ## Evaluation
46
 
47
- TODO: Results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
 
49
 
50
  ## Citation
51
 
 
12
 
13
  <!-- Provide a quick summary of what the model is/does. -->
14
 
15
+ EAGLE-3 drafter model for GPT-oss-20b. This model is released as a part of _Attention Drift: What Speculative Decoding Models Learn_ paper.
16
+ It has several minor architectural differences from the original EAGLE: Drafter hidden state is captured *after* the norm, additional norm injected before FC.
17
 
18
  ## Model Details
19
 
20
  ### Model Sources [optional]
21
 
22
+ - **Repository:** [Dogacel/SpecDrift](https://github.com/Dogacel/SpecDrift)
23
  - **Paper [optional]:** TODO
 
24
 
25
  ## Uses
26
 
27
+ We recommend using SGLang to run the model,
28
+
29
+ ```
30
+ python -m sglang.launch_server \
31
+ --model-path openai/gpt-oss-20b \
32
+ --speculative-algorithm EAGLE3 \
33
+ --speculative-draft-model-path "Dogacel/specdrift-gpt-oss-20b-eagle3" \
34
+ --speculative-num-steps 3 \
35
+ --speculative-eagle-topk 1 \
36
+ --speculative-num-draft-tokens 4 \
37
+ --port 30000 \
38
+ --dp-size 1 --tp-size 1 \
39
+ --max-running-requests 64 \
40
+ --cuda-graph-max-bs 64 \
41
+ --attention-backend fa3 \
42
+ --trust-remote-code \
43
+ --mem-fraction-static 0.5 --dtype bfloat16
44
+ ```
45
 
46
  ## Training Details
47
 
 
49
 
50
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
51
 
52
+ This model is trained on Nemoron Post Training V2 dataset, answers regenerated using gpt-oss-20b.
53
+
54
+ Dataset publicly available at: https://huggingface.co/datasets/Dogacel/nemotron-post-training-v2-gpt-oss-20b-regen
55
 
56
  ### Training Procedure
57
 
58
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
59
 
60
+ We've trained our model using [SpecForge](https://github.com/sgl-project/SpecForge) on 8xH200 within 8 hours.
61
+
62
+ - **LR:** 1e-4 (warmup 0.2, cosine)
63
+ - **Epochs:** 2
64
+ - **Batch Size:** 4 (Effective 4x8=32)
65
+ - **Max Length:** 4096
66
+ - **TTT:** 4
67
 
68
  TODO: Fill training parameters
69
 
70
  ## Evaluation
71
 
72
+ Evaluation has run on: MT-Bench, 80 prompts, max tokens 2048, temperature 0.7
73
+
74
+ Scripts available at [SpecForge](https://github.com/sgl-project/SpecForge/pull/552).
75
+
76
+ ### H100 @ BS=1 — Baseline vs Ours (1-3-1-4)
77
+
78
+ | Metric | Baseline | Ours (1-3-1-4) | Δ |
79
+ |---|---:|---:|---:|
80
+ | **Latency (s)** | 444.05 | **373.11** | −16.0% |
81
+ | **Throughput (tok/s)** | 304.93 | **371.90** | +22.0% |
82
+ | **Accept Length** | 1.000 | **2.347** | +134.7% |
83
+
84
+ ### Per-Category Throughput (H100, BS=1)
85
+
86
+ | Category | Baseline → Ours | Δ | Accept Length |
87
+ |---|---:|---:|---:|
88
+ | Writing | 207.83 → 268.62 | +29.2% | 2.225 |
89
+ | Roleplay | 301.01 → 380.61 | +26.4% | 2.210 |
90
+ | Reasoning | 260.19 → 265.83 | +2.2% | 2.334 |
91
+ | Math | 170.41 → 190.53 | +11.8% | **2.894** |
92
+ | Coding | 427.36 → 487.45 | +14.1% | 2.672 |
93
+ | Extraction | 164.69 → 233.76 | **+41.9%** | 2.634 |
94
+ | STEM | 436.35 → 545.97 | +25.1% | 2.287 |
95
+ | Humanities | 471.61 → 602.40 | +27.7% | 2.112 |
96
+
97
 
98
+ Our evaluation on higher batch sizes has shown the model performance matches or slightly exceeds the baseline.
99
 
100
  ## Citation
101