Edge Inference Benchmarks
π
5
On-Device benchmarks across devices and models.
None defined yet.
pip install flash-head
vllm serve embedl/Qwen3-1.7B-FlashHead-W4A16vllm.general_plugins entry point. No source patches, no custom imports.vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1
# Baseline comparison
FLASHHEAD_ENABLED=0 vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1