Add pipeline tag and paper link
Browse filesHi, I'm Niels from the Hugging Face community team. This PR improves the model card by adding the `pipeline_tag: text-generation` to the metadata for better discoverability. It also includes a direct link to the research paper "[SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting](https://huggingface.co/papers/2605.07243)" and ensures the official GitHub repository is properly referenced.
README.md
CHANGED
|
@@ -1,30 +1,38 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
-
|
| 6 |
tags:
|
| 7 |
- speculative-decoding
|
| 8 |
- specblock
|
| 9 |
- draft-model
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# SpecBlock-Llama-3.1-8B-Instruct
|
| 13 |
|
| 14 |
SpecBlock draft model for speculative decoding, trained against the target model [`meta-llama/Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
|
| 15 |
|
|
|
|
|
|
|
| 16 |
## Method
|
| 17 |
|
| 18 |
-
SpecBlock — multi-block test-time training with cross-slot hidden injection between decoder layers and dynamic tree drafting.
|
| 19 |
|
| 20 |
## Usage
|
| 21 |
|
| 22 |
-
End-to-end training and inference code
|
| 23 |
|
| 24 |
Quick eval with the HF backend:
|
| 25 |
|
| 26 |
```bash
|
| 27 |
-
python benchmarks_hf/run_eval.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
```
|
| 29 |
|
| 30 |
## Citation
|
|
@@ -39,4 +47,4 @@ python benchmarks_hf/run_eval.py --algorithm specblock --model-path meta
|
|
| 39 |
primaryClass={cs.CL},
|
| 40 |
url={https://arxiv.org/abs/2605.07243}
|
| 41 |
}
|
| 42 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: meta-llama/Llama-3.1-8B-Instruct
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
+
license: apache-2.0
|
| 6 |
tags:
|
| 7 |
- speculative-decoding
|
| 8 |
- specblock
|
| 9 |
- draft-model
|
| 10 |
+
pipeline_tag: text-generation
|
| 11 |
---
|
| 12 |
|
| 13 |
# SpecBlock-Llama-3.1-8B-Instruct
|
| 14 |
|
| 15 |
SpecBlock draft model for speculative decoding, trained against the target model [`meta-llama/Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
|
| 16 |
|
| 17 |
+
This model was introduced in the paper [SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting](https://huggingface.co/papers/2605.07243).
|
| 18 |
+
|
| 19 |
## Method
|
| 20 |
|
| 21 |
+
SpecBlock — multi-block test-time training with cross-slot hidden injection between decoder layers and dynamic tree drafting. It combines path dependence with efficient drafting by producing $K$ dependent positions per forward call.
|
| 22 |
|
| 23 |
## Usage
|
| 24 |
|
| 25 |
+
End-to-end training and inference code can be found in the [official GitHub repository](https://github.com/shiweijiezero/SpecBlock).
|
| 26 |
|
| 27 |
Quick eval with the HF backend:
|
| 28 |
|
| 29 |
```bash
|
| 30 |
+
python benchmarks_hf/run_eval.py \
|
| 31 |
+
--algorithm specblock \
|
| 32 |
+
--model-path meta-llama/Llama-3.1-8B-Instruct \
|
| 33 |
+
--draft-model-path <local-clone-of-this-repo> \
|
| 34 |
+
--benchmark-list mtbench:80 humaneval:164 gsm8k:200 \
|
| 35 |
+
--output ./hf_results/specblock_llama.jsonl
|
| 36 |
```
|
| 37 |
|
| 38 |
## Citation
|
|
|
|
| 47 |
primaryClass={cs.CL},
|
| 48 |
url={https://arxiv.org/abs/2605.07243}
|
| 49 |
}
|
| 50 |
+
```
|