poolside/Laguna-XS.2-speculator.dflash

This is a DFlash speculator model for poolside/Laguna-XS.2.

Training Details

This model was trained using the Speculators library on a combination of Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered and the train_sft split of HuggingFaceH4/ultrachat_200k. Responses were regenerated by Laguna-XS.2 (with reasoning).

Model Specifications

Base Model poolside/Laguna-XS.2
Chat Template poolside/Laguna-XS.2 (use /chat/completions endpoint)
Format Safetensors
License Apache 2.0
Validation Hardware Nvidia A100

Deployment

# Install vLLM from the required PR
pip install git+https://github.com/vllm-project/vllm.git@refs/pull/41880/head

# Deploy with speculative decoding
VLLM_USE_DEEP_GEMM=0 vllm serve poolside/Laguna-XS.2 \
    --tensor-parallel-size 1 \
    --max-model-len 16384 \
    --tool-call-parser poolside_v1 \
    --reasoning-parser poolside_v1 \
    --enable-auto-tool-choice \
    --default-chat-template-kwargs '{"enable_thinking": true}' \
    --speculative-config '{
        "model": "poolside/Laguna-XS.2-speculator.dflash",
        "num_speculative_tokens": 7,
        "method": "dflash"
    }'

Preliminary Evaluations

Per-position token acceptance rates across datasets: (with reasoning enabled)

Dataset Pos 1 Pos 2 Pos 3 Pos 4 Pos 5 Pos 6 Pos 7 Avg Length
HumanEval 74.0% 48.6% 29.9% 17.7% 9.9% 5.1% 2.4% 2.876
math_reasoning 76.9% 53.2% 34.6% 21.2% 12.1% 6.0% 2.6% 3.066
qa 68.5% 41.8% 24.8% 14.7% 8.4% 4.6% 2.2% 2.650
question 70.6% 44.1% 26.2% 15.0% 8.4% 4.5% 2.3% 2.711
rag 71.7% 45.7% 27.6% 16.0% 8.9% 4.8% 2.3% 2.770
summarization 68.8% 40.8% 22.7% 12.3% 6.5% 3.3% 1.5% 2.559
translation 70.8% 44.3% 25.0% 13.0% 6.5% 3.1% 1.2% 2.639
writing 70.9% 44.6% 26.8% 15.8% 9.4% 5.4% 2.3% 2.752

References

Paper: DFlash: Block Diffusion for Flash Speculative Decoding

Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for poolside/Laguna-XS.2-speculator.dflash

Finetuned
(4)
this model

Collection including poolside/Laguna-XS.2-speculator.dflash

Paper for poolside/Laguna-XS.2-speculator.dflash