Add Discord support banner to model card

65642e4 verified 4 days ago

8.53 kB

base_model:
  - Qwen/Qwen3.5-4B
tags:
  - embedl
  - qwen3.5
  - multimodal
  - vlm
  - flashhead
  - llmcompressor
  - qwen3_5
pipeline_tag: image-text-to-text
license: other
license_name: embedl-models-community-licence-1.0
license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE
extra_gated_prompt: >-
  The information you provide will be collected, stored, processed and shared in
  accordance with the [Embedl Privacy
  Policy](https://www.embedl.com/privacy-policy).
extra_gated_fields:
  Company: text

Optimized by Embedl

Need to fine-tune, hit performance targets, or deploy on specific hardware?

We've got you covered.

Learn more Get in touch →

Qwen3.5-4B-FlashHead

Optimized version of Qwen/Qwen3.5-4B using FlashHead, Embedl's efficient replacement for the language model head.

This model adds FlashHead, a lightweight replacement for the dense LM head that significantly improves throughput while preserving accuracy. Weights are kept in FP16 precision.

The model preserves Text + Image / Video -> Text behavior and reasoning capabilities while improving inference throughput.

FlashHead is available as a vLLM plugin via pip install flash-head.

Model Details

Field	Value
Model	embedl/Qwen3.5-4B-FlashHead
Base Model	Qwen/Qwen3.5-4B
Input / Output	Text + Image / Video -> Text
Version	1.0
Optimizations	FlashHead LM Head
Developers	Embedl
Licenses	Upstream: Apache License 2.0. Optimized components: Embedl Models Community Licence v1.0 (no redistribution)
Intended Use	Text generation, reasoning, assistant-style interaction, video analytics, and general-purpose multimodal NLP on NVIDIA GPUs

Optimizations

FlashHead LM Head: Lightweight replacement for the dense LM head, significantly improving throughput.

Benchmarks

Installation

pip install flash-head

The flash-head vLLM plugin is required. It activates automatically at startup.

License

This model is a derivative of Qwen/Qwen3.5-4B.

Upstream: Apache License 2.0
Optimized Components: Embedl Models Community Licence v1.0 (no redistribution)

Contact

Enterprise and Commercial Inquiries: models@embedl.com
Technical Issues and Early Access: https://github.com/embedl/flash-head
More Information and Model Releases: https://embedl.com

Partner & Developer Opportunities

If you are evaluating on-device inference, building products on this model, or exploring custom model optimization, reach out for:

Engineering support for on-prem and edge deployments
Early access and partner co-marketing opportunities

Contact: models@embedl.com

Community & support

Need help with this model? Chat with the Embedl team and other engineers on Discord.

Quantization gotchas, hardware questions, fine-tuning tips — bring them all.

Join our Discord →