Shikhar1
/

SpecKV

+---
+license: mit
+tags:
+- speculative-decoding
+- inference-optimization
+- llm
+- 'adaptive-inference language:'
+- 'en pipeline_tag: other'
+---
+# SpecKV-MLP16: Adaptive Gamma Selector for Speculative Decoding
+This is the trained acceptance rate predictor from the SpecKV paper. It selects the optimal speculation length (gamma) per step using draft model signals, achieving 56.0% more tokens per speculation step than the fixed gamma=4 default.
+## Quick Start
+```python
+import pickle
+import numpy as np
+# load model
+with open("speckv_mlp16.pkl", "rb") as f:
+    model = pickle.load(f)
+# at each speculation step, extract these from draft token distributions:
+draft_entropy = 1.5       # mean entropy across draft tokens
+draft_confidence = 0.72   # mean top-1 confidence
+max_entropy = 2.3         # max entropy in the step
+min_confidence = 0.45     # min confidence in the step
+comp_enc = 0              # 0=fp16, 1=int8, 2=nf4
+# pick best gamma
+best_gamma, best_expected = 2, 0
+for gamma in [2, 4, 6, 8]:
+    features = np.array([[draft_entropy, draft_confidence, max_entropy, min_confidence, comp_enc, gamma]])
+    pred_ar = np.clip(model.predict(features)[0], 0, 1)
+    expected_tokens = pred_ar * gamma + 1
+    if expected_tokens > best_expected:
+        best_expected = expected_tokens
+        best_gamma = gamma
+print(f"Use gamma={best_gamma} (expected {best_expected:.1f} tokens)")
+```
+## Framework-Agnostic Loading
+If you do not want a sklearn dependency, load the raw weights:
+```python
+import numpy as np
+weights = np.load("speckv_mlp16_weights.npz")
+W1, b1 = weights["W1"], weights["b1"]  # (6, 16), (16,)
+W2, b2 = weights["W2"], weights["b2"]  # (16, 1), (1,)
+def predict(x):
+    h = np.maximum(0, x @ W1 + b1)  # ReLU
+    return float(h @ W2 + b2)
+```
+## Model Details
+| Property | Value |
+|:---|:---|
+| Architecture | MLP, 1 hidden layer, 16 units, ReLU |
+| Input | 6 features (entropy, confidence, max/min variants, compression, gamma) |
+| Output | Acceptance rate prediction (0-1) |
+| Training data | 5,112 step-level records |
+| Test MSE | 0.090 |
+| Test correlation | 0.685 |
+| Decision overhead | 0.34ms (4 predictions per decision) |
+| Improvement over fixed gamma=4 | 56.0% |
+| Statistical significance | p < 0.001 |
+## Files
+- `speckv_mlp16.pkl` - Full scikit-learn model (pickle)
+- `speckv_mlp16_weights.npz` - Raw numpy weights (W1, b1, W2, b2)
+- `config.json` - Model configuration and metadata
+- `requirements.txt` - Python dependencies
+## Citation
+```bibtex
+@article{shukla2026speckv,
+  title={SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection},
+  author={Shukla, Shikhar},
+  journal={arXiv preprint},
+  year={2026}
+}
+```
+## Links
+- [Paper (arXiv)](https://arxiv.org/abs/2605.02888)
+- [Code and Data (GitHub)](https://github.com/Amorfati123/SpecKV)