initial commit

Browse files

Files changed (4) hide show

.gitattributes +35 -0
README.md +136 -0
cmgrpo_raven_full.pt +3 -0
raven_model.pt +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+---
+license: cc-by-nc-4.0
+library_name: pytorch
+tags:
+  - text-to-video
+  - video-generation
+  - diffusion
+  - autoregressive
+  - consistency-model
+  - grpo
+  - wan2.1
+  - raven
+base_model:
+  - Wan-AI/Wan2.1-T2V-1.3B
+pipeline_tag: text-to-video
+---
+# RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO
+[Yanzuo Lu](https://yanzuo.lu/) · [Ronglai Zuo](https://2000zrl.github.io/) · [Jiankang Deng](https://jiankangdeng.github.io/) — Imperial College London
+Project page: https://yanzuo.lu/raven
+## Overview
+RAVEN is a causal autoregressive text-to-video generation model built on Wan2.1-T2V-1.3B. It is designed for real-time streaming video generation by extrapolating future video chunks from previously generated content.
+The release contains two checkpoints:
+| File | Description |
+| --- | --- |
+| `raven_model.pt` | Main RAVEN checkpoint for causal autoregressive text-to-video generation. |
+| `cmgrpo_raven_full.pt` | Unmerged CM-GRPO LoRA checkpoint. In the codebase this is loaded through the LoRA path with rank 256 and alpha 256 on top of the RAVEN/Wan backbone. |
+RAVEN trains a causal video generator using a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This aligns the model's training attention pattern with inference-time autoregressive extrapolation and allows downstream chunk losses to supervise the historical representations used for future predictions.
+We also release CM-GRPO weights. CM-GRPO formulates a consistency-model sampling step as a conditional Gaussian transition and applies online Group Relative Policy Optimization directly to this kernel.
+## Model details
+- Base architecture: Wan2.1-T2V-1.3B DiT
+- Task: text-to-video generation
+- Generation mode: causal autoregressive video extrapolation
+- Resolution used in released configs: 480 x 832
+- Frames: 81
+- FPS: 16
+- Sampling steps: 4
+- Sampler: consistency sampler
+- Schedule: linear interpolation schedule, `v_lerp` prediction type
+- Classifier-free guidance: not used; the `guidance_scale=3.0` value in the configs is a placeholder for interface compatibility
+- Causal chunking: `chunk_size=3`, `independent_first_chunk=3`, `sink=0`, `window_size=null`
+- VAE stride: `[4, 8, 8]`
+- Latent channels: 16
+- DiT config: dim 1536, 30 layers, 12 heads, FFN dim 8960, text length 512
+## Usage
+This repository only hosts the released model weights. Please use the RAVEN codebase for inference and evaluation:
+```bash
+git clone https://github.com/YanzuoLu/RAVEN.git
+cd RAVEN
+```
+Set up the environment:
+```bash
+conda env create -f tools/environment.yaml
+conda activate raven
+bash tools/prepare_venv.sh
+source venv/bin/activate
+```
+Download this model repository:
+```bash
+hf download oliveryanzuolu/RAVEN --local-dir /path/to/RAVEN-weights
+```
+Then point the relevant config files to the downloaded checkpoints (`raven_model.pt` for RAVEN, `cmgrpo_raven_full.pt` for CM-GRPO).
+Reference configs:
+```bash
+configs/trials/generate_t2v/causal_wan2.1_1.3B_t2v/raven_baseline_prompts.jsonc
+configs/trials/generate_t2v/causal_wan2.1_1.3B_t2v/cmgrpo_baseline_prompts.jsonc
+configs/trials/vbench_t2v/causal_wan2.1_1.3B_t2v/raven.jsonc
+configs/trials/vbench_t2v/causal_wan2.1_1.3B_t2v/cmgrpo.jsonc
+```
+Run qualitative generation:
+```bash
+bash tools/multi_run.sh configs/trials/generate_t2v/causal_wan2.1_1.3B_t2v/raven_baseline_prompts.jsonc
+bash tools/multi_run.sh configs/trials/generate_t2v/causal_wan2.1_1.3B_t2v/cmgrpo_baseline_prompts.jsonc
+```
+Run VBench prompt-suite sampling:
+```bash
+bash tools/multi_run.sh configs/trials/vbench_t2v/causal_wan2.1_1.3B_t2v/raven.jsonc
+bash tools/multi_run.sh configs/trials/vbench_t2v/causal_wan2.1_1.3B_t2v/cmgrpo.jsonc
+```
+## Requirements
+The released configs depend on the RAVEN codebase and the upstream Wan2.1-T2V-1.3B components, including:
+- Wan2.1-T2V-1.3B diffusion backbone / DiT config
+- Wan2.1 VAE
+- UMT5-XXL tokenizer and text encoder
+- Python 3.10
+- CUDA 12.8
+- PyTorch 2.11 + cu128
+- flash-attention 2/3 and magi-attention as built by `tools/prepare_venv.sh`
+See the code repository README for full setup and evaluation instructions.
+## License
+This model is released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). See the `LICENSE` file in the code repository for details.
+The upstream Wan2.1 components are subject to their own licenses and terms. Users are responsible for complying with all applicable licenses for the base model, code, data, and dependencies.
+## Citation
+If you find this work useful, please cite RAVEN. A BibTeX entry will be added when available.
+```bibtex
+@misc{lu2026raven,
+  title        = {RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO},
+  author       = {Lu, Yanzuo and Zuo, Ronglai and Deng, Jiankang},
+  year         = {2026},
+  howpublished = {\url{https://yanzuo.lu/raven}}
+}
+```

cmgrpo_raven_full.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fc61b635450bf3f27e35d48ac49c4d3423d2e8f53e2dabcf2a492dee3f0f8650
+size 7102893635

raven_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:73e41928df3c7a90bca2b3299e1970ce0fdf60c4d1635e214c1b7af5a982e986
+size 5676256254