YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Doc-to-LoRA β€” Qwen2.5-Omni Thinker

Sakana AI Doc-to-LoRA hypernetwork trained on Qwen2.5-Omni thinker (dense, 7.6B params).

Training Details

  • Base model: Qwen2.5-Omni thinker (extracted as standalone Qwen2ForCausalLM)
  • Method: Sakana AI Doc-to-LoRA (arXiv:2602.15902)
  • Steps: 5,000 (Phase 1 β€” needs 80K+ for full fact encoding)
  • LoRA target: down_proj, rank 8
  • Context encoder: Idefics2 Perceiver (9 blocks, 8 latent queries)
  • Datasets: SQuAD, DROP, ROPES (self-generated QA)
  • Final loss: 0.838 (from 1.304 at step 1)
  • Hardware: 1x A100 80GB

Status

Early checkpoint (5K/80K steps). The hypernetwork learns to generate LoRA structure that modifies model behavior, but does not yet encode specific facts from documents. Full training (80K Phase 1 + 20K Phase 2) is needed for factual encoding.

Usage

from ctx_to_lora.modeling.hypernet import ModulatedPretrainedModel
import torch

state_dict = torch.load("checkpoint-5000/pytorch_model.bin", weights_only=False)
model = ModulatedPretrainedModel.from_state_dict(state_dict, train=False, use_sequence_packing=False)
model.internalize("Your document text here...")
# Now generate β€” model has internalized the document via LoRA

Part of HyperMod

This is one component of the HyperMod memory system for Claudia.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for claudiapersists/doc-to-lora-qwen25-omni-thinker