Image-to-Video
Diffusers
Safetensors
ti2v
DarthZhu nielsr HF Staff commited on
Commit
705ce8d
·
1 Parent(s): 00839c1

Improve model card and metadata (#1)

Browse files

- Improve model card and metadata (772d72ff63760e3cba8b16ae19374bc2c760d61e)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +33 -3
README.md CHANGED
@@ -1,6 +1,36 @@
1
  ---
2
- datasets:
3
- - DarthZhu/VideoRLVR-Data
 
4
  base_model:
5
  - Wan-AI/Wan2.2-TI2V-5B
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ library_name: diffusers
4
+ pipeline_tag: image-to-video
5
  base_model:
6
  - Wan-AI/Wan2.2-TI2V-5B
7
+ datasets:
8
+ - DarthZhu/VideoRLVR-Data
9
+ ---
10
+
11
+ # VideoRLVR
12
+
13
+ VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards, introduced in the paper [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458).
14
+
15
+ This checkpoint is an RL-optimized version of [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) trained on procedurally generated reasoning tasks including Maze, FlowFree, and Sokoban.
16
+
17
+ - **Paper:** [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458)
18
+ - **Project Page:** [https://darthzhu.github.io/VideoRLVR-page/](https://darthzhu.github.io/VideoRLVR-page/)
19
+ - **Repository:** [https://github.com/luka-group/VideoRLVR](https://github.com/luka-group/VideoRLVR)
20
+
21
+ ## Overview
22
+
23
+ VideoRLVR formulates video reasoning as the generation of verifiable visual trajectories. It utilizes an SDE-GRPO optimization backbone, dense decomposed rewards, and an Early-Step Focus strategy for efficient training. This approach enables video diffusion models to satisfy explicit spatial, temporal, or logical constraints, moving beyond perceptual imitation toward reliable rule-consistent visual reasoning.
24
+
25
+ Across tasks like Maze, FlowFree, and Sokoban, VideoRLVR consistently improves over supervised fine-tuning baselines, demonstrating that verifiable RL can effectively optimize models for objective success criteria.
26
+
27
+ ## Citation
28
+
29
+ ```bibtex
30
+ @article{zhu2026video,
31
+ title={Video Models Can Reason with Verifiable Rewards},
32
+ author={Tinghui Zhu and Sheng Zhang and James Y. Huang and Selena Song and Xiaofei Wen and Yuankai Li and Hoifung Poon and Muhao Chen},
33
+ journal={arXiv preprint arXiv:2605.15458},
34
+ year={2026}
35
+ }
36
+ ```