Shikhar1 commited on
Commit
70e8fd6
·
verified ·
1 Parent(s): 8eda2cf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -3
README.md CHANGED
@@ -1,3 +1,96 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - speculative-decoding
5
+ - inference-optimization
6
+ - llm
7
+ - 'adaptive-inference language:'
8
+ - 'en pipeline_tag: other'
9
+ ---
10
+
11
+ # SpecKV-MLP16: Adaptive Gamma Selector for Speculative Decoding
12
+
13
+ This is the trained acceptance rate predictor from the SpecKV paper. It selects the optimal speculation length (gamma) per step using draft model signals, achieving 56.0% more tokens per speculation step than the fixed gamma=4 default.
14
+
15
+ ## Quick Start
16
+
17
+ ```python
18
+ import pickle
19
+ import numpy as np
20
+
21
+ # load model
22
+ with open("speckv_mlp16.pkl", "rb") as f:
23
+ model = pickle.load(f)
24
+
25
+ # at each speculation step, extract these from draft token distributions:
26
+ draft_entropy = 1.5 # mean entropy across draft tokens
27
+ draft_confidence = 0.72 # mean top-1 confidence
28
+ max_entropy = 2.3 # max entropy in the step
29
+ min_confidence = 0.45 # min confidence in the step
30
+ comp_enc = 0 # 0=fp16, 1=int8, 2=nf4
31
+
32
+ # pick best gamma
33
+ best_gamma, best_expected = 2, 0
34
+ for gamma in [2, 4, 6, 8]:
35
+ features = np.array([[draft_entropy, draft_confidence, max_entropy, min_confidence, comp_enc, gamma]])
36
+ pred_ar = np.clip(model.predict(features)[0], 0, 1)
37
+ expected_tokens = pred_ar * gamma + 1
38
+ if expected_tokens > best_expected:
39
+ best_expected = expected_tokens
40
+ best_gamma = gamma
41
+
42
+ print(f"Use gamma={best_gamma} (expected {best_expected:.1f} tokens)")
43
+ ```
44
+
45
+ ## Framework-Agnostic Loading
46
+
47
+ If you do not want a sklearn dependency, load the raw weights:
48
+
49
+ ```python
50
+ import numpy as np
51
+
52
+ weights = np.load("speckv_mlp16_weights.npz")
53
+ W1, b1 = weights["W1"], weights["b1"] # (6, 16), (16,)
54
+ W2, b2 = weights["W2"], weights["b2"] # (16, 1), (1,)
55
+
56
+ def predict(x):
57
+ h = np.maximum(0, x @ W1 + b1) # ReLU
58
+ return float(h @ W2 + b2)
59
+ ```
60
+
61
+ ## Model Details
62
+
63
+ | Property | Value |
64
+ |:---|:---|
65
+ | Architecture | MLP, 1 hidden layer, 16 units, ReLU |
66
+ | Input | 6 features (entropy, confidence, max/min variants, compression, gamma) |
67
+ | Output | Acceptance rate prediction (0-1) |
68
+ | Training data | 5,112 step-level records |
69
+ | Test MSE | 0.090 |
70
+ | Test correlation | 0.685 |
71
+ | Decision overhead | 0.34ms (4 predictions per decision) |
72
+ | Improvement over fixed gamma=4 | 56.0% |
73
+ | Statistical significance | p < 0.001 |
74
+
75
+ ## Files
76
+
77
+ - `speckv_mlp16.pkl` - Full scikit-learn model (pickle)
78
+ - `speckv_mlp16_weights.npz` - Raw numpy weights (W1, b1, W2, b2)
79
+ - `config.json` - Model configuration and metadata
80
+ - `requirements.txt` - Python dependencies
81
+
82
+ ## Citation
83
+
84
+ ```bibtex
85
+ @article{shukla2026speckv,
86
+ title={SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection},
87
+ author={Shukla, Shikhar},
88
+ journal={arXiv preprint},
89
+ year={2026}
90
+ }
91
+ ```
92
+
93
+ ## Links
94
+
95
+ - [Paper (arXiv)](https://arxiv.org/abs/2605.02888)
96
+ - [Code and Data (GitHub)](https://github.com/Amorfati123/SpecKV)