DualHead-GritLM-Qwen3.5-4B

GritLM-style joint training ablation for the Hydra paper. Trained with alternating retrieval (80%) and generation (20%) batches.

Key Finding

Joint training adds complexity with zero benefit. LoRA-on generation fails catastrophically (single token "The" with p=0.91, image-blind). Both functional modes (LoRA-on retrieval, LoRA-off generation) are equivalent to Hydra's retrieval-only training.

Mode Result
LoRA on, bidirectional (retrieval) 0.8893 nDCG@5
LoRA off, causal (generation) 0.561 ANLS, 76.5% match
LoRA on, causal (joint-training goal) image-blind

Files

  • adapter_config.json + adapter_model.safetensors -- LoRA adapter
  • lm_head.pt -- Base model lm_head
  • results/ -- Raw evaluation JSONs

Related

Citation

@article{georgiou2026hydra,
  title={Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model},
  author={Georgiou, Athos},
  year={2026}
}
Downloads last month
61
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for athrael-soju/DualHead-GritLM-Qwen3.5-4B

Finetuned
Qwen/Qwen3.5-4B
Adapter
(102)
this model

Datasets used to train athrael-soju/DualHead-GritLM-Qwen3.5-4B

Collection including athrael-soju/DualHead-GritLM-Qwen3.5-4B