oliveryanzuolu commited on
Commit
3626511
·
0 Parent(s):

initial commit

Browse files
Files changed (4) hide show
  1. .gitattributes +35 -0
  2. README.md +136 -0
  3. cmgrpo_raven_full.pt +3 -0
  4. raven_model.pt +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ library_name: pytorch
4
+ tags:
5
+ - text-to-video
6
+ - video-generation
7
+ - diffusion
8
+ - autoregressive
9
+ - consistency-model
10
+ - grpo
11
+ - wan2.1
12
+ - raven
13
+ base_model:
14
+ - Wan-AI/Wan2.1-T2V-1.3B
15
+ pipeline_tag: text-to-video
16
+ ---
17
+
18
+ # RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO
19
+
20
+ [Yanzuo Lu](https://yanzuo.lu/) · [Ronglai Zuo](https://2000zrl.github.io/) · [Jiankang Deng](https://jiankangdeng.github.io/) — Imperial College London
21
+
22
+ Project page: https://yanzuo.lu/raven
23
+
24
+ ## Overview
25
+
26
+ RAVEN is a causal autoregressive text-to-video generation model built on Wan2.1-T2V-1.3B. It is designed for real-time streaming video generation by extrapolating future video chunks from previously generated content.
27
+
28
+ The release contains two checkpoints:
29
+
30
+ | File | Description |
31
+ | --- | --- |
32
+ | `raven_model.pt` | Main RAVEN checkpoint for causal autoregressive text-to-video generation. |
33
+ | `cmgrpo_raven_full.pt` | Unmerged CM-GRPO LoRA checkpoint. In the codebase this is loaded through the LoRA path with rank 256 and alpha 256 on top of the RAVEN/Wan backbone. |
34
+
35
+ RAVEN trains a causal video generator using a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This aligns the model's training attention pattern with inference-time autoregressive extrapolation and allows downstream chunk losses to supervise the historical representations used for future predictions.
36
+
37
+ We also release CM-GRPO weights. CM-GRPO formulates a consistency-model sampling step as a conditional Gaussian transition and applies online Group Relative Policy Optimization directly to this kernel.
38
+
39
+ ## Model details
40
+
41
+ - Base architecture: Wan2.1-T2V-1.3B DiT
42
+ - Task: text-to-video generation
43
+ - Generation mode: causal autoregressive video extrapolation
44
+ - Resolution used in released configs: 480 x 832
45
+ - Frames: 81
46
+ - FPS: 16
47
+ - Sampling steps: 4
48
+ - Sampler: consistency sampler
49
+ - Schedule: linear interpolation schedule, `v_lerp` prediction type
50
+ - Classifier-free guidance: not used; the `guidance_scale=3.0` value in the configs is a placeholder for interface compatibility
51
+ - Causal chunking: `chunk_size=3`, `independent_first_chunk=3`, `sink=0`, `window_size=null`
52
+ - VAE stride: `[4, 8, 8]`
53
+ - Latent channels: 16
54
+ - DiT config: dim 1536, 30 layers, 12 heads, FFN dim 8960, text length 512
55
+
56
+ ## Usage
57
+
58
+ This repository only hosts the released model weights. Please use the RAVEN codebase for inference and evaluation:
59
+
60
+ ```bash
61
+ git clone https://github.com/YanzuoLu/RAVEN.git
62
+ cd RAVEN
63
+ ```
64
+
65
+ Set up the environment:
66
+
67
+ ```bash
68
+ conda env create -f tools/environment.yaml
69
+ conda activate raven
70
+ bash tools/prepare_venv.sh
71
+ source venv/bin/activate
72
+ ```
73
+
74
+ Download this model repository:
75
+
76
+ ```bash
77
+ hf download oliveryanzuolu/RAVEN --local-dir /path/to/RAVEN-weights
78
+ ```
79
+
80
+ Then point the relevant config files to the downloaded checkpoints (`raven_model.pt` for RAVEN, `cmgrpo_raven_full.pt` for CM-GRPO).
81
+
82
+ Reference configs:
83
+
84
+ ```bash
85
+ configs/trials/generate_t2v/causal_wan2.1_1.3B_t2v/raven_baseline_prompts.jsonc
86
+ configs/trials/generate_t2v/causal_wan2.1_1.3B_t2v/cmgrpo_baseline_prompts.jsonc
87
+ configs/trials/vbench_t2v/causal_wan2.1_1.3B_t2v/raven.jsonc
88
+ configs/trials/vbench_t2v/causal_wan2.1_1.3B_t2v/cmgrpo.jsonc
89
+ ```
90
+
91
+ Run qualitative generation:
92
+
93
+ ```bash
94
+ bash tools/multi_run.sh configs/trials/generate_t2v/causal_wan2.1_1.3B_t2v/raven_baseline_prompts.jsonc
95
+ bash tools/multi_run.sh configs/trials/generate_t2v/causal_wan2.1_1.3B_t2v/cmgrpo_baseline_prompts.jsonc
96
+ ```
97
+
98
+ Run VBench prompt-suite sampling:
99
+
100
+ ```bash
101
+ bash tools/multi_run.sh configs/trials/vbench_t2v/causal_wan2.1_1.3B_t2v/raven.jsonc
102
+ bash tools/multi_run.sh configs/trials/vbench_t2v/causal_wan2.1_1.3B_t2v/cmgrpo.jsonc
103
+ ```
104
+
105
+ ## Requirements
106
+
107
+ The released configs depend on the RAVEN codebase and the upstream Wan2.1-T2V-1.3B components, including:
108
+
109
+ - Wan2.1-T2V-1.3B diffusion backbone / DiT config
110
+ - Wan2.1 VAE
111
+ - UMT5-XXL tokenizer and text encoder
112
+ - Python 3.10
113
+ - CUDA 12.8
114
+ - PyTorch 2.11 + cu128
115
+ - flash-attention 2/3 and magi-attention as built by `tools/prepare_venv.sh`
116
+
117
+ See the code repository README for full setup and evaluation instructions.
118
+
119
+ ## License
120
+
121
+ This model is released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). See the `LICENSE` file in the code repository for details.
122
+
123
+ The upstream Wan2.1 components are subject to their own licenses and terms. Users are responsible for complying with all applicable licenses for the base model, code, data, and dependencies.
124
+
125
+ ## Citation
126
+
127
+ If you find this work useful, please cite RAVEN. A BibTeX entry will be added when available.
128
+
129
+ ```bibtex
130
+ @misc{lu2026raven,
131
+ title = {RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO},
132
+ author = {Lu, Yanzuo and Zuo, Ronglai and Deng, Jiankang},
133
+ year = {2026},
134
+ howpublished = {\url{https://yanzuo.lu/raven}}
135
+ }
136
+ ```
cmgrpo_raven_full.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc61b635450bf3f27e35d48ac49c4d3423d2e8f53e2dabcf2a492dee3f0f8650
3
+ size 7102893635
raven_model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73e41928df3c7a90bca2b3299e1970ce0fdf60c4d1635e214c1b7af5a982e986
3
+ size 5676256254