You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

The information you provide will be collected, stored, processed and shared in accordance with the Embedl Privacy Policy.

Optimized by Embedl

Need to fine-tune, hit performance targets, or deploy on specific hardware?

We've got you covered.

Learn more Get in touch →

Qwen3.5-2B-FlashHead

Optimized version of Qwen/Qwen3.5-2B using FlashHead, Embedl's efficient replacement for the language model head.

This model adds FlashHead, a lightweight replacement for the dense LM head that significantly improves throughput while preserving accuracy. Weights are kept in FP16 precision.

The model preserves Text + Image / Video -> Text behavior and reasoning capabilities while improving inference throughput.

FlashHead is available as a vLLM plugin via pip install flash-head.

Model Details

Field	Value
Model	embedl/Qwen3.5-2B-FlashHead
Base Model	Qwen/Qwen3.5-2B
Input / Output	Text + Image / Video -> Text
Version	1.0
Optimizations	FlashHead LM Head
Developers	Embedl
Licenses	Upstream: Apache License 2.0. Optimized components: Embedl Models Community Licence v1.0 (no redistribution)
Intended Use	Text generation, reasoning, assistant-style interaction, video analytics, and general-purpose multimodal NLP on NVIDIA GPUs

Optimizations

FlashHead LM Head: Lightweight replacement for the dense LM head, significantly improving throughput.

Benchmarks

Installation

pip install flash-head

The flash-head vLLM plugin is required. It activates automatically at startup.

License

This model is a derivative of Qwen/Qwen3.5-2B.

Upstream: Apache License 2.0
Optimized Components: Embedl Models Community Licence v1.0 (no redistribution)

Contact

Enterprise and Commercial Inquiries: models@embedl.com
Technical Issues and Early Access: https://github.com/embedl/flash-head
More Information and Model Releases: https://embedl.com

Partner & Developer Opportunities

If you are evaluating on-device inference, building products on this model, or exploring custom model optimization, reach out for:

Engineering support for on-prem and edge deployments
Early access and partner co-marketing opportunities

Contact: models@embedl.com

Community & support

Need help with this model? Chat with the Embedl team and other engineers on Discord.

Quantization gotchas, hardware questions, fine-tuning tips — bring them all.

Join our Discord →

Downloads last month: 472

Safetensors

Model size

2B params

Tensor type

F32

BF16

Model tree for embedl/Qwen3.5-2B-FlashHead

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Finetuned

(159)

this model

Collections including embedl/Qwen3.5-2B-FlashHead