You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
The information you provide will be collected, stored, processed and shared in accordance with the Embedl Privacy Policy.
Log in or Sign Up to review the conditions and access this model content.
Qwen3.5-2B-FlashHead
Optimized version of Qwen/Qwen3.5-2B using FlashHead, Embedl's efficient replacement for the language model head.
This model adds FlashHead, a lightweight replacement for the dense LM head that significantly improves throughput while preserving accuracy. Weights are kept in FP16 precision.
The model preserves Text + Image / Video -> Text behavior and reasoning capabilities while improving inference throughput.
FlashHead is available as a vLLM plugin via pip install flash-head.
Model Details
| Field | Value |
|---|---|
| Model | embedl/Qwen3.5-2B-FlashHead |
| Base Model | Qwen/Qwen3.5-2B |
| Input / Output | Text + Image / Video -> Text |
| Version | 1.0 |
| Optimizations | FlashHead LM Head |
| Developers | Embedl |
| Licenses | Upstream: Apache License 2.0. Optimized components: Embedl Models Community Licence v1.0 (no redistribution) |
| Intended Use | Text generation, reasoning, assistant-style interaction, video analytics, and general-purpose multimodal NLP on NVIDIA GPUs |
Optimizations
- FlashHead LM Head: Lightweight replacement for the dense LM head, significantly improving throughput.
Benchmarks
Installation
pip install flash-head
The flash-head vLLM plugin is required. It activates automatically at startup.
License
This model is a derivative of Qwen/Qwen3.5-2B.
- Upstream: Apache License 2.0
- Optimized Components: Embedl Models Community Licence v1.0 (no redistribution)
Contact
- Enterprise and Commercial Inquiries:
models@embedl.com - Technical Issues and Early Access:
https://github.com/embedl/flash-head - More Information and Model Releases:
https://embedl.com
Partner & Developer Opportunities
If you are evaluating on-device inference, building products on this model, or exploring custom model optimization, reach out for:
- Engineering support for on-prem and edge deployments
- Early access and partner co-marketing opportunities
Contact: models@embedl.com
- Downloads last month
- 472