--- library_name: speculators base_model: - poolside/Laguna-XS.2 license: apache-2.0 tags: - speculative-decoding - dflash - speculators --- # poolside/Laguna-XS.2-speculator.dflash This is a DFlash speculator model for [poolside/Laguna-XS.2](https://huggingface.co/poolside/Laguna-XS.2). ## Training Details This model was trained using the [Speculators](https://github.com/vllm-project/speculators) library on a combination of [Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered) and the `train_sft` split of [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k). Responses were regenerated by Laguna-XS.2 (with reasoning). ## Model Specifications | | | |---|---| | **Base Model** | poolside/Laguna-XS.2 | | **Chat Template** | poolside/Laguna-XS.2 (use `/chat/completions` endpoint) | | **Format** | Safetensors | | **License** | Apache 2.0 | | **Validation Hardware** | Nvidia A100 | ## Deployment ```bash # Install vLLM from the required PR pip install git+https://github.com/vllm-project/vllm.git@refs/pull/41880/head # Deploy with speculative decoding VLLM_USE_DEEP_GEMM=0 vllm serve poolside/Laguna-XS.2 \ --tensor-parallel-size 1 \ --max-model-len 16384 \ --tool-call-parser poolside_v1 \ --reasoning-parser poolside_v1 \ --enable-auto-tool-choice \ --default-chat-template-kwargs '{"enable_thinking": true}' \ --speculative-config '{ "model": "poolside/Laguna-XS.2-speculator.dflash", "num_speculative_tokens": 7, "method": "dflash" }' ``` ## Preliminary Evaluations Per-position token acceptance rates across datasets: (with reasoning enabled) | Dataset | Pos 1 | Pos 2 | Pos 3 | Pos 4 | Pos 5 | Pos 6 | Pos 7 | Avg Length | |---------|-------|-------|-------|-------|-------|-------|-------|------------| | HumanEval | 74.0% | 48.6% | 29.9% | 17.7% | 9.9% | 5.1% | 2.4% | 2.876 | | math_reasoning | 76.9% | 53.2% | 34.6% | 21.2% | 12.1% | 6.0% | 2.6% | 3.066 | | qa | 68.5% | 41.8% | 24.8% | 14.7% | 8.4% | 4.6% | 2.2% | 2.650 | | question | 70.6% | 44.1% | 26.2% | 15.0% | 8.4% | 4.5% | 2.3% | 2.711 | | rag | 71.7% | 45.7% | 27.6% | 16.0% | 8.9% | 4.8% | 2.3% | 2.770 | | summarization | 68.8% | 40.8% | 22.7% | 12.3% | 6.5% | 3.3% | 1.5% | 2.559 | | translation | 70.8% | 44.3% | 25.0% | 13.0% | 6.5% | 3.1% | 1.2% | 2.639 | | writing | 70.9% | 44.6% | 26.8% | 15.8% | 9.4% | 5.4% | 2.3% | 2.752 | ## References **Paper**: [DFlash: Block Diffusion for Flash Speculative Decoding](https://arxiv.org/abs/2602.06036)