YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

VisMemVideo NoVis SFT Checkpoint

Model: Qwen2.5-VL-7B-Instruct + VisMemVideo (QueryBuilder + CrossAttentionFusion + LoRA Memory Former)

Architecture

  • Reasoner: Frozen Qwen2.5-VL-7B-Instruct (provides visual context H)
  • Memory Former: Separate Qwen2.5-VL-7B + LoRA adapter (trainable)
  • Video Memory Bank: QwenVideoMemoryBank (reuses reasoner ViT, 3584-dim)
  • QueryBuilder: 1-layer transformer encoder (8 heads, ff_mult=2)
  • CrossAttentionFusion: 2-layer cross-attention (d_inner=512, 8 heads)
  • Training mode: NoVis SFT (text-only input with memory augmentation, visual info only through memory tokens)

Training Config

Parameter Value
Dataset Video-R1-COT-10k-video-only-4k (4000 samples)
Epochs 2
Effective batch size 16 (4 GPU ร— bs2 ร— grad_accum 2)
Learning rate 1e-5 (custom params: 5e-5)
Scheduler Cosine with 3% warmup
invoke_prob 0.5
query_len / mem_len 8 / 8
LoRA r=16, alpha=32, targets=q,k,v,o_proj
Trainable params 134M (QB: 110M, CA: 14M, LoRA: 10M)

Training Results

Metric Value
Final loss 0.912
Average loss 0.972
Ref loss (no memory) ~1.14
Final mem_lift +0.19
Training time 2h 8min (4ร— GPU)

Checkpoints

  • epoch1/ โ€” Checkpoint at step 250 (end of epoch 1)
  • epoch2/ โ€” Checkpoint at step 500 (end of epoch 2, final)

Loading

from open_r1.vismem_video_model import load_vismem_video_checkpoint

# Point to epoch1/ or epoch2/ directory
load_vismem_video_checkpoint(model, "path/to/epoch2")

Or via CLI argument:

--latent_checkpoint path/to/epoch2
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support