Image-to-Video
Diffusers
Safetensors
ti2v
nielsr's picture
nielsr HF Staff
Improve model card and metadata
772d72f verified
|
raw
history blame
1.86 kB
metadata
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-video
base_model:
  - Wan-AI/Wan2.2-TI2V-5B
datasets:
  - DarthZhu/VideoRLVR-Data

VideoRLVR

VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards, introduced in the paper Video Models Can Reason with Verifiable Rewards.

This checkpoint is an RL-optimized version of Wan2.2-TI2V-5B trained on procedurally generated reasoning tasks including Maze, FlowFree, and Sokoban.

Overview

VideoRLVR formulates video reasoning as the generation of verifiable visual trajectories. It utilizes an SDE-GRPO optimization backbone, dense decomposed rewards, and an Early-Step Focus strategy for efficient training. This approach enables video diffusion models to satisfy explicit spatial, temporal, or logical constraints, moving beyond perceptual imitation toward reliable rule-consistent visual reasoning.

Across tasks like Maze, FlowFree, and Sokoban, VideoRLVR consistently improves over supervised fine-tuning baselines, demonstrating that verifiable RL can effectively optimize models for objective success criteria.

Citation

@article{zhu2026video,
  title={Video Models Can Reason with Verifiable Rewards}, 
  author={Tinghui Zhu and Sheng Zhang and James Y. Huang and Selena Song and Xiaofei Wen and Yuankai Li and Hoifung Poon and Muhao Chen},
  journal={arXiv preprint arXiv:2605.15458},
  year={2026}
}