sparse-encoder-testing/inference-free-splade-bert-tiny-nq
Feature Extraction • Updated • 1 • 1
Any plans on releasing flashhead for qwen3.5 models?
pip install flash-head
vllm serve embedl/Qwen3-1.7B-FlashHead-W4A16vllm.general_plugins entry point. No source patches, no custom imports.vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1
# Baseline comparison
FLASHHEAD_ENABLED=0 vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1