Instructions to use DarthZhu/VideoRLVR-Wan2.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use DarthZhu/VideoRLVR-Wan2.2 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("DarthZhu/VideoRLVR-Wan2.2", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
Improve model card with paper link and metadata (#1)
Browse files- Improve model card with paper link and metadata (43ee3e4a586564ec59564122974102cf7f3b69e1)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,6 +1,37 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
| 4 |
base_model:
|
| 5 |
- DarthZhu/VideoRLVR-Wan2.2-Base
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: diffusers
|
| 4 |
+
pipeline_tag: image-to-video
|
| 5 |
base_model:
|
| 6 |
- DarthZhu/VideoRLVR-Wan2.2-Base
|
| 7 |
+
datasets:
|
| 8 |
+
- DarthZhu/VideoRLVR-Data
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# VideoRLVR
|
| 12 |
+
|
| 13 |
+
VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards. This model is a reinforcement-learning optimized version of [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B), presented in the paper [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458).
|
| 14 |
+
|
| 15 |
+
The model uses an SDE-GRPO optimization backbone and rule-based feedback to improve visual reasoning in complex, procedurally generated tasks such as Maze, FlowFree, and Sokoban.
|
| 16 |
+
|
| 17 |
+
- **Paper:** [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458)
|
| 18 |
+
- **Project Page:** [https://darthzhu.github.io/VideoRLVR-page/](https://darthzhu.github.io/VideoRLVR-page/)
|
| 19 |
+
- **Code:** [https://github.com/luka-group/VideoRLVR](https://github.com/luka-group/VideoRLVR)
|
| 20 |
+
|
| 21 |
+
## Method Overview
|
| 22 |
+
|
| 23 |
+
VideoRLVR formulates video reasoning as the generation of verifiable visual trajectories. Key components include:
|
| 24 |
+
1. **SDE-GRPO**: An optimization backbone for video diffusion models.
|
| 25 |
+
2. **Dense Decomposed Rewards**: Verifiable, rule-based feedback to guide the model.
|
| 26 |
+
3. **Early-Step Focus**: A strategy that restricts policy optimization to the early denoising phase, significantly reducing training latency while preserving performance.
|
| 27 |
+
|
| 28 |
+
## Citation
|
| 29 |
+
|
| 30 |
+
```bibtex
|
| 31 |
+
@article{zhu2026video,
|
| 32 |
+
title={Video Models Can Reason with Verifiable Rewards},
|
| 33 |
+
author={Tinghui Zhu and Sheng Zhang and James Y. Huang and Selena Song and Xiaofei Wen and Yuankai Li and Hoifung Poon and Muhao Chen},
|
| 34 |
+
journal={arXiv preprint arXiv:2605.15458},
|
| 35 |
+
year={2026}
|
| 36 |
+
}
|
| 37 |
+
```
|