Magzimilian's picture
Add Discord support banner to model card
65642e4 verified
metadata
base_model:
  - Qwen/Qwen3.5-4B
tags:
  - embedl
  - qwen3.5
  - multimodal
  - vlm
  - flashhead
  - llmcompressor
  - qwen3_5
pipeline_tag: image-text-to-text
license: other
license_name: embedl-models-community-licence-1.0
license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE
extra_gated_prompt: >-
  The information you provide will be collected, stored, processed and shared in
  accordance with the [Embedl Privacy
  Policy](https://www.embedl.com/privacy-policy).
extra_gated_fields:
  Company: text
Optimized by Embedl
Need to fine-tune, hit performance targets, or deploy on specific hardware?
We've got you covered.
Learn more Get in touch →

Qwen3.5-4B-FlashHead

GitHub

Optimized version of Qwen/Qwen3.5-4B using FlashHead, Embedl's efficient replacement for the language model head.

This model adds FlashHead, a lightweight replacement for the dense LM head that significantly improves throughput while preserving accuracy. Weights are kept in FP16 precision.

The model preserves Text + Image / Video -> Text behavior and reasoning capabilities while improving inference throughput.

FlashHead is available as a vLLM plugin via pip install flash-head.


Model Details

Field Value
Model embedl/Qwen3.5-4B-FlashHead
Base Model Qwen/Qwen3.5-4B
Input / Output Text + Image / Video -> Text
Version 1.0
Optimizations FlashHead LM Head
Developers Embedl
Licenses Upstream: Apache License 2.0.
Optimized components: Embedl Models Community Licence v1.0 (no redistribution)
Intended Use Text generation, reasoning, assistant-style interaction, video analytics, and general-purpose multimodal NLP on NVIDIA GPUs

Optimizations

  • FlashHead LM Head: Lightweight replacement for the dense LM head, significantly improving throughput.

Benchmarks

Edge Inference Benchmarks for Qwen3.5

Installation

pip install flash-head

The flash-head vLLM plugin is required. It activates automatically at startup.

License

This model is a derivative of Qwen/Qwen3.5-4B.

  • Upstream: Apache License 2.0
  • Optimized Components: Embedl Models Community Licence v1.0 (no redistribution)

Contact

  • Enterprise and Commercial Inquiries: models@embedl.com
  • Technical Issues and Early Access: https://github.com/embedl/flash-head
  • More Information and Model Releases: https://embedl.com

Partner & Developer Opportunities

If you are evaluating on-device inference, building products on this model, or exploring custom model optimization, reach out for:

  • Engineering support for on-prem and edge deployments
  • Early access and partner co-marketing opportunities

Contact: models@embedl.com

Community & support
Need help with this model? Chat with the Embedl team and other engineers on Discord.
Quantization gotchas, hardware questions, fine-tuning tips — bring them all.
Join our Discord →