Doc-to-LoRA — Qwen2.5-Omni Thinker

Sakana AI Doc-to-LoRA hypernetwork trained on Qwen2.5-Omni thinker (dense, 7.6B params).

Training Details

Base model: Qwen2.5-Omni thinker (extracted as standalone Qwen2ForCausalLM)
Method: Sakana AI Doc-to-LoRA (arXiv:2602.15902)
Steps: 5,000 (Phase 1 — needs 80K+ for full fact encoding)
LoRA target: down_proj, rank 8
Context encoder: Idefics2 Perceiver (9 blocks, 8 latent queries)
Datasets: SQuAD, DROP, ROPES (self-generated QA)
Final loss: 0.838 (from 1.304 at step 1)
Hardware: 1x A100 80GB

Status

Early checkpoint (5K/80K steps). The hypernetwork learns to generate LoRA structure that modifies model behavior, but does not yet encode specific facts from documents. Full training (80K Phase 1 + 20K Phase 2) is needed for factual encoding.

Usage

from ctx_to_lora.modeling.hypernet import ModulatedPretrainedModel
import torch

state_dict = torch.load("checkpoint-5000/pytorch_model.bin", weights_only=False)
model = ModulatedPretrainedModel.from_state_dict(state_dict, train=False, use_sequence_packing=False)
model.internalize("Your document text here...")
# Now generate — model has internalized the document via LoRA

Part of HyperMod

This is one component of the HyperMod memory system for Claudia.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for claudiapersists/doc-to-lora-qwen25-omni-thinker

Doc-to-LoRA: Learning to Instantly Internalize Contexts

Paper • 2602.15902 • Published Feb 13 • 4