Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,96 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- speculative-decoding
|
| 5 |
+
- inference-optimization
|
| 6 |
+
- llm
|
| 7 |
+
- 'adaptive-inference language:'
|
| 8 |
+
- 'en pipeline_tag: other'
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# SpecKV-MLP16: Adaptive Gamma Selector for Speculative Decoding
|
| 12 |
+
|
| 13 |
+
This is the trained acceptance rate predictor from the SpecKV paper. It selects the optimal speculation length (gamma) per step using draft model signals, achieving 56.0% more tokens per speculation step than the fixed gamma=4 default.
|
| 14 |
+
|
| 15 |
+
## Quick Start
|
| 16 |
+
|
| 17 |
+
```python
|
| 18 |
+
import pickle
|
| 19 |
+
import numpy as np
|
| 20 |
+
|
| 21 |
+
# load model
|
| 22 |
+
with open("speckv_mlp16.pkl", "rb") as f:
|
| 23 |
+
model = pickle.load(f)
|
| 24 |
+
|
| 25 |
+
# at each speculation step, extract these from draft token distributions:
|
| 26 |
+
draft_entropy = 1.5 # mean entropy across draft tokens
|
| 27 |
+
draft_confidence = 0.72 # mean top-1 confidence
|
| 28 |
+
max_entropy = 2.3 # max entropy in the step
|
| 29 |
+
min_confidence = 0.45 # min confidence in the step
|
| 30 |
+
comp_enc = 0 # 0=fp16, 1=int8, 2=nf4
|
| 31 |
+
|
| 32 |
+
# pick best gamma
|
| 33 |
+
best_gamma, best_expected = 2, 0
|
| 34 |
+
for gamma in [2, 4, 6, 8]:
|
| 35 |
+
features = np.array([[draft_entropy, draft_confidence, max_entropy, min_confidence, comp_enc, gamma]])
|
| 36 |
+
pred_ar = np.clip(model.predict(features)[0], 0, 1)
|
| 37 |
+
expected_tokens = pred_ar * gamma + 1
|
| 38 |
+
if expected_tokens > best_expected:
|
| 39 |
+
best_expected = expected_tokens
|
| 40 |
+
best_gamma = gamma
|
| 41 |
+
|
| 42 |
+
print(f"Use gamma={best_gamma} (expected {best_expected:.1f} tokens)")
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
## Framework-Agnostic Loading
|
| 46 |
+
|
| 47 |
+
If you do not want a sklearn dependency, load the raw weights:
|
| 48 |
+
|
| 49 |
+
```python
|
| 50 |
+
import numpy as np
|
| 51 |
+
|
| 52 |
+
weights = np.load("speckv_mlp16_weights.npz")
|
| 53 |
+
W1, b1 = weights["W1"], weights["b1"] # (6, 16), (16,)
|
| 54 |
+
W2, b2 = weights["W2"], weights["b2"] # (16, 1), (1,)
|
| 55 |
+
|
| 56 |
+
def predict(x):
|
| 57 |
+
h = np.maximum(0, x @ W1 + b1) # ReLU
|
| 58 |
+
return float(h @ W2 + b2)
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
## Model Details
|
| 62 |
+
|
| 63 |
+
| Property | Value |
|
| 64 |
+
|:---|:---|
|
| 65 |
+
| Architecture | MLP, 1 hidden layer, 16 units, ReLU |
|
| 66 |
+
| Input | 6 features (entropy, confidence, max/min variants, compression, gamma) |
|
| 67 |
+
| Output | Acceptance rate prediction (0-1) |
|
| 68 |
+
| Training data | 5,112 step-level records |
|
| 69 |
+
| Test MSE | 0.090 |
|
| 70 |
+
| Test correlation | 0.685 |
|
| 71 |
+
| Decision overhead | 0.34ms (4 predictions per decision) |
|
| 72 |
+
| Improvement over fixed gamma=4 | 56.0% |
|
| 73 |
+
| Statistical significance | p < 0.001 |
|
| 74 |
+
|
| 75 |
+
## Files
|
| 76 |
+
|
| 77 |
+
- `speckv_mlp16.pkl` - Full scikit-learn model (pickle)
|
| 78 |
+
- `speckv_mlp16_weights.npz` - Raw numpy weights (W1, b1, W2, b2)
|
| 79 |
+
- `config.json` - Model configuration and metadata
|
| 80 |
+
- `requirements.txt` - Python dependencies
|
| 81 |
+
|
| 82 |
+
## Citation
|
| 83 |
+
|
| 84 |
+
```bibtex
|
| 85 |
+
@article{shukla2026speckv,
|
| 86 |
+
title={SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection},
|
| 87 |
+
author={Shukla, Shikhar},
|
| 88 |
+
journal={arXiv preprint},
|
| 89 |
+
year={2026}
|
| 90 |
+
}
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
## Links
|
| 94 |
+
|
| 95 |
+
- [Paper (arXiv)](https://arxiv.org/abs/2605.02888)
|
| 96 |
+
- [Code and Data (GitHub)](https://github.com/Amorfati123/SpecKV)
|