DualHead-GritLM-Qwen3.5-4B

GritLM-style joint training ablation for the Hydra paper. Trained with alternating retrieval (80%) and generation (20%) batches.

Key Finding

Joint training adds complexity with zero benefit. LoRA-on generation fails catastrophically (single token "The" with p=0.91, image-blind). Both functional modes (LoRA-on retrieval, LoRA-off generation) are equivalent to Hydra's retrieval-only training.

Mode	Result
LoRA on, bidirectional (retrieval)	0.8893 nDCG@5
LoRA off, causal (generation)	0.561 ANLS, 76.5% match
LoRA on, causal (joint-training goal)	image-blind

Files

adapter_config.json + adapter_model.safetensors -- LoRA adapter
lm_head.pt -- Base model lm_head
results/ -- Raw evaluation JSONs

HydraQwen3.5-4B
Training + eval code (shared with the canonical Hydra-4B repo)

Citation

@article{georgiou2026hydra,
  title={Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model},
  author={Georgiou, Athos},
  year={2026}
}