dsikka's picture
Update README.md
b345a81 verified
---
library_name: speculators
base_model:
- poolside/Laguna-XS.2
license: apache-2.0
tags:
- speculative-decoding
- dflash
- speculators
---
# poolside/Laguna-XS.2-speculator.dflash
This is a DFlash speculator model for [poolside/Laguna-XS.2](https://huggingface.co/poolside/Laguna-XS.2).
## Training Details
This model was trained using the [Speculators](https://github.com/vllm-project/speculators) library on a combination of [Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered) and the `train_sft` split of [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k). Responses were regenerated by Laguna-XS.2 (with reasoning).
## Model Specifications
| | |
|---|---|
| **Base Model** | poolside/Laguna-XS.2 |
| **Chat Template** | poolside/Laguna-XS.2 (use `/chat/completions` endpoint) |
| **Format** | Safetensors |
| **License** | Apache 2.0 |
| **Validation Hardware** | Nvidia A100 |
## Deployment
```bash
# Install vLLM from the required PR
pip install git+https://github.com/vllm-project/vllm.git@refs/pull/41880/head
# Deploy with speculative decoding
VLLM_USE_DEEP_GEMM=0 vllm serve poolside/Laguna-XS.2 \
--tensor-parallel-size 1 \
--max-model-len 16384 \
--tool-call-parser poolside_v1 \
--reasoning-parser poolside_v1 \
--enable-auto-tool-choice \
--default-chat-template-kwargs '{"enable_thinking": true}' \
--speculative-config '{
"model": "poolside/Laguna-XS.2-speculator.dflash",
"num_speculative_tokens": 7,
"method": "dflash"
}'
```
## Preliminary Evaluations
Per-position token acceptance rates across datasets:
(with reasoning enabled)
| Dataset | Pos 1 | Pos 2 | Pos 3 | Pos 4 | Pos 5 | Pos 6 | Pos 7 | Avg Length |
|---------|-------|-------|-------|-------|-------|-------|-------|------------|
| HumanEval | 74.0% | 48.6% | 29.9% | 17.7% | 9.9% | 5.1% | 2.4% | 2.876 |
| math_reasoning | 76.9% | 53.2% | 34.6% | 21.2% | 12.1% | 6.0% | 2.6% | 3.066 |
| qa | 68.5% | 41.8% | 24.8% | 14.7% | 8.4% | 4.6% | 2.2% | 2.650 |
| question | 70.6% | 44.1% | 26.2% | 15.0% | 8.4% | 4.5% | 2.3% | 2.711 |
| rag | 71.7% | 45.7% | 27.6% | 16.0% | 8.9% | 4.8% | 2.3% | 2.770 |
| summarization | 68.8% | 40.8% | 22.7% | 12.3% | 6.5% | 3.3% | 1.5% | 2.559 |
| translation | 70.8% | 44.3% | 25.0% | 13.0% | 6.5% | 3.1% | 1.2% | 2.639 |
| writing | 70.9% | 44.6% | 26.8% | 15.8% | 9.4% | 5.4% | 2.3% | 2.752 |
## References
**Paper**: [DFlash: Block Diffusion for Flash Speculative Decoding](https://arxiv.org/abs/2602.06036)