Instructions to use DarthZhu/VideoRLVR-Wan2.2-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use DarthZhu/VideoRLVR-Wan2.2-Base with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("DarthZhu/VideoRLVR-Wan2.2-Base", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: diffusers | |
| pipeline_tag: image-to-video | |
| base_model: | |
| - Wan-AI/Wan2.2-TI2V-5B | |
| datasets: | |
| - DarthZhu/VideoRLVR-Data | |
| # VideoRLVR | |
| VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards, introduced in the paper [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458). | |
| This checkpoint is an RL-optimized version of [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) trained on procedurally generated reasoning tasks including Maze, FlowFree, and Sokoban. | |
| - **Paper:** [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458) | |
| - **Project Page:** [https://darthzhu.github.io/VideoRLVR-page/](https://darthzhu.github.io/VideoRLVR-page/) | |
| - **Repository:** [https://github.com/luka-group/VideoRLVR](https://github.com/luka-group/VideoRLVR) | |
| ## Overview | |
| VideoRLVR formulates video reasoning as the generation of verifiable visual trajectories. It utilizes an SDE-GRPO optimization backbone, dense decomposed rewards, and an Early-Step Focus strategy for efficient training. This approach enables video diffusion models to satisfy explicit spatial, temporal, or logical constraints, moving beyond perceptual imitation toward reliable rule-consistent visual reasoning. | |
| Across tasks like Maze, FlowFree, and Sokoban, VideoRLVR consistently improves over supervised fine-tuning baselines, demonstrating that verifiable RL can effectively optimize models for objective success criteria. | |
| ## Citation | |
| ```bibtex | |
| @article{zhu2026video, | |
| title={Video Models Can Reason with Verifiable Rewards}, | |
| author={Tinghui Zhu and Sheng Zhang and James Y. Huang and Selena Song and Xiaofei Wen and Yuankai Li and Hoifung Poon and Muhao Chen}, | |
| journal={arXiv preprint arXiv:2605.15458}, | |
| year={2026} | |
| } | |
| ``` |