Senfier-LiqiJing
/

StyleVAR

visual-autoregressive

reinforcement-learning

image-generation

Model card Files Files and versions

Senfier-LiqiJing commited on 14 days ago

Commit

33a56eb

·

verified ·

1 Parent(s): 9f6fa31

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ The model is trained in two stages from a pretrained vanilla VAR checkpoint:
 - **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters (rank r=256) and Per-Action Normalization Weighting (PANW) to rebalance credit across the 256× token imbalance between coarse and fine scales.
 - **Code:** https://github.com/Senfier-LiqiJing/StyleVAR
-- **Paper (OpenReview):** https://openreview.net/forum?id=UHW3PgLUsa
 - **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
 ---
@@ -34,7 +34,7 @@ The model is trained in two stages from a pretrained vanilla VAR checkpoint:
 | File | Purpose | Notes |
 |---|---|---|
 | `vae_ch160v4096z32.pth` | Frozen multi-scale VQ-VAE tokenizer | Inherited from VAR (depth-16, C_vae=32, V=4096, ch=160). Shared between SFT and GRPO. |
-| `StyleVAR_SFT.pth` | Stage 1 supervised fine-tuning checkpoint | Plain state dict — no LoRA, no optimizer state. Use this for the SFT baseline. |
 | `StyleVAR-GRPO.pth` | Stage 2 GRPO-refined checkpoint | GRPO LoRA deltas already merged into the base weights. Drop-in replacement for `StyleVAR_SFT.pth`. |
 Both transformer checkpoints ship as plain state dicts under the `"model"` key — LoRA adapters have been baked in, so you can load them directly into a fresh StyleVAR without constructing any LoRA wrappers.

 - **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters (rank r=256) and Per-Action Normalization Weighting (PANW) to rebalance credit across the 256× token imbalance between coarse and fine scales.
 - **Code:** https://github.com/Senfier-LiqiJing/StyleVAR
+- **Paper (arXiv):** https://arxiv.org/abs/2604.21052
 - **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
 ---
 | File | Purpose | Notes |
 |---|---|---|
 | `vae_ch160v4096z32.pth` | Frozen multi-scale VQ-VAE tokenizer | Inherited from VAR (depth-16, C_vae=32, V=4096, ch=160). Shared between SFT and GRPO. |
+| `StyleVAR_SFT.pth` | Stage 1 supervised fine-tuning checkpoint | State dict with optimizer state. Use this for the SFT baseline. |
 | `StyleVAR-GRPO.pth` | Stage 2 GRPO-refined checkpoint | GRPO LoRA deltas already merged into the base weights. Drop-in replacement for `StyleVAR_SFT.pth`. |
 Both transformer checkpoints ship as plain state dicts under the `"model"` key — LoRA adapters have been baked in, so you can load them directly into a fresh StyleVAR without constructing any LoRA wrappers.