Instructions to use DarthZhu/VideoRLVR-Wan2.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use DarthZhu/VideoRLVR-Wan2.2 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("DarthZhu/VideoRLVR-Wan2.2", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
Improve model card with paper link and metadata
Browse filesThis PR improves the model card by adding metadata (license, library name, and pipeline tag) and linking it to the associated paper [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458), project page, and code repository.
README.md
CHANGED
|
@@ -1,6 +1,37 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
| 4 |
base_model:
|
| 5 |
- DarthZhu/VideoRLVR-Wan2.2-Base
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: diffusers
|
| 4 |
+
pipeline_tag: image-to-video
|
| 5 |
base_model:
|
| 6 |
- DarthZhu/VideoRLVR-Wan2.2-Base
|
| 7 |
+
datasets:
|
| 8 |
+
- DarthZhu/VideoRLVR-Data
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# VideoRLVR
|
| 12 |
+
|
| 13 |
+
VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards. This model is a reinforcement-learning optimized version of [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B), presented in the paper [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458).
|
| 14 |
+
|
| 15 |
+
The model uses an SDE-GRPO optimization backbone and rule-based feedback to improve visual reasoning in complex, procedurally generated tasks such as Maze, FlowFree, and Sokoban.
|
| 16 |
+
|
| 17 |
+
- **Paper:** [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458)
|
| 18 |
+
- **Project Page:** [https://darthzhu.github.io/VideoRLVR-page/](https://darthzhu.github.io/VideoRLVR-page/)
|
| 19 |
+
- **Code:** [https://github.com/luka-group/VideoRLVR](https://github.com/luka-group/VideoRLVR)
|
| 20 |
+
|
| 21 |
+
## Method Overview
|
| 22 |
+
|
| 23 |
+
VideoRLVR formulates video reasoning as the generation of verifiable visual trajectories. Key components include:
|
| 24 |
+
1. **SDE-GRPO**: An optimization backbone for video diffusion models.
|
| 25 |
+
2. **Dense Decomposed Rewards**: Verifiable, rule-based feedback to guide the model.
|
| 26 |
+
3. **Early-Step Focus**: A strategy that restricts policy optimization to the early denoising phase, significantly reducing training latency while preserving performance.
|
| 27 |
+
|
| 28 |
+
## Citation
|
| 29 |
+
|
| 30 |
+
```bibtex
|
| 31 |
+
@article{zhu2026video,
|
| 32 |
+
title={Video Models Can Reason with Verifiable Rewards},
|
| 33 |
+
author={Tinghui Zhu and Sheng Zhang and James Y. Huang and Selena Song and Xiaofei Wen and Yuankai Li and Hoifung Poon and Muhao Chen},
|
| 34 |
+
journal={arXiv preprint arXiv:2605.15458},
|
| 35 |
+
year={2026}
|
| 36 |
+
}
|
| 37 |
+
```
|