DarthZhu
/

VideoRLVR-Wan2.2-Base

Model card Files Files and versions

DarthZhu commited on 1 day ago

Commit

540da95

·

verified ·

1 Parent(s): 705ce8d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ datasets:
 VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards, introduced in the paper [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458).
-This checkpoint is an RL-optimized version of [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) trained on procedurally generated reasoning tasks including Maze, FlowFree, and Sokoban.
 - **Paper:** [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458)
 - **Project Page:** [https://darthzhu.github.io/VideoRLVR-page/](https://darthzhu.github.io/VideoRLVR-page/)

 VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards, introduced in the paper [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458).
+This checkpoint is an SFT version of [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) trained on procedurally generated reasoning tasks including Maze, FlowFree, and Sokoban.
 - **Paper:** [Video Models Can Reason with Verifiable Rewards](https://huggingface.co/papers/2605.15458)
 - **Project Page:** [https://darthzhu.github.io/VideoRLVR-page/](https://darthzhu.github.io/VideoRLVR-page/)