Image-to-Video
Diffusers
Safetensors
ti2v
DarthZhu nielsr HF Staff commited on
Commit
ec6bbb1
·
1 Parent(s): 1752094

Improve model card with paper link and metadata (#1)

Browse files

- Improve model card with paper link and metadata (43ee3e4a586564ec59564122974102cf7f3b69e1)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +34 -3
README.md CHANGED
@@ -1,6 +1,37 @@
1
  ---
2
- datasets:
3
- - DarthZhu/VideoRLVR-Data
 
4
  base_model:
5
  - DarthZhu/VideoRLVR-Wan2.2-Base
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ library_name: diffusers
4
+ pipeline_tag: image-to-video
5
  base_model:
6
  - DarthZhu/VideoRLVR-Wan2.2-Base
7
+ datasets:
8
+ - DarthZhu/VideoRLVR-Data
9
+ ---
10
+
11
+ # VideoRLVR
12
+
13
+ VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards. This model is a reinforcement-learning optimized version of [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B), presented in the paper [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458).
14
+
15
+ The model uses an SDE-GRPO optimization backbone and rule-based feedback to improve visual reasoning in complex, procedurally generated tasks such as Maze, FlowFree, and Sokoban.
16
+
17
+ - **Paper:** [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458)
18
+ - **Project Page:** [https://darthzhu.github.io/VideoRLVR-page/](https://darthzhu.github.io/VideoRLVR-page/)
19
+ - **Code:** [https://github.com/luka-group/VideoRLVR](https://github.com/luka-group/VideoRLVR)
20
+
21
+ ## Method Overview
22
+
23
+ VideoRLVR formulates video reasoning as the generation of verifiable visual trajectories. Key components include:
24
+ 1. **SDE-GRPO**: An optimization backbone for video diffusion models.
25
+ 2. **Dense Decomposed Rewards**: Verifiable, rule-based feedback to guide the model.
26
+ 3. **Early-Step Focus**: A strategy that restricts policy optimization to the early denoising phase, significantly reducing training latency while preserving performance.
27
+
28
+ ## Citation
29
+
30
+ ```bibtex
31
+ @article{zhu2026video,
32
+ title={Video Models Can Reason with Verifiable Rewards},
33
+ author={Tinghui Zhu and Sheng Zhang and James Y. Huang and Selena Song and Xiaofei Wen and Yuankai Li and Hoifung Poon and Muhao Chen},
34
+ journal={arXiv preprint arXiv:2605.15458},
35
+ year={2026}
36
+ }
37
+ ```