HydraQwen2.5-Omni-3B

Paper: https://arxiv.org/abs/2603.28554

One model, many heads.

Omni-modal extension of Hydra, applying the dual-head LoRA-toggle architecture to Qwen2.5-Omni-3B. No Hydra-specific training -- the adapter from vidore/colqwen-omni-v0.1 is used as-is.

Three inference modes from a single 4.4B-parameter model:

  1. Retrieval (LoRA on, bidirectional): ColBERT multi-vector embeddings over images, audio, or video
  2. Text generation (LoRA off, causal): Autoregressive text conditioned on any input modality
  3. Speech generation (LoRA off, causal, talker enabled): Spoken answers via thinker-talker-vocoder pipeline

Files

  • lm_head.pt -- Preserved lm_head weights from Qwen2.5-Omni-3B thinker
  • results/ -- Raw evaluation JSONs
  • scripts/ -- Training + eval scripts

Results

Proof-of-concept results -- zero-shot, single run, no Hydra-specific training.

Retrieval

Benchmark avg nDCG@5 # tasks
ViDoRe V1 0.8865 10
ViDoRe V2 0.5353 4
ViDoRe V3 0.4907 8
AudioCaps R@1 (zero-shot) 26.2% —

V1 is the full 10-task set (InfoVQA retrieval added 2026-04-18 — see results/VidoreInfoVQARetrieval_predictions.json), directly comparable to the HydraQwen3.5-4B V1 column.

Generation equivalence (InfoVQA)

Base Qwen2.5-Omni-3B thinker vs. HydraQwen2.5-Omni-3B with LoRA disabled, same ViDoRe-matched protocol as the 4B model: greedy (T=0), 128 new tokens, full InfoVQA validation (n=2,801), short-answer prompt suffix applied identically to both paths (needed because Qwen2.5-Omni's default outputs are sentence-form).

n Base ANLS Hydra ANLS Δ (95% CI) Exact match
2,801 0.7257 0.7257 +0.0000 [+0.0000, +0.0000] 2,801 / 2,801 (100.00%)

Byte-identical outputs on every sample. The adapter-off path recovers base-model generation exactly at the output-token level. Report: results/infovqa_report.json.

Related

Citation

@article{georgiou2026hydra,
  title={Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model},
  author={Georgiou, Athos},
  year={2026}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for athrael-soju/HydraQwen2.5-Omni-3B

Adapter
(1)
this model

Collection including athrael-soju/HydraQwen2.5-Omni-3B

Paper for athrael-soju/HydraQwen2.5-Omni-3B