Senfier-LiqiJing commited on
Commit
33a56eb
·
verified ·
1 Parent(s): 9f6fa31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -24,7 +24,7 @@ The model is trained in two stages from a pretrained vanilla VAR checkpoint:
24
  - **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters (rank r=256) and Per-Action Normalization Weighting (PANW) to rebalance credit across the 256× token imbalance between coarse and fine scales.
25
 
26
  - **Code:** https://github.com/Senfier-LiqiJing/StyleVAR
27
- - **Paper (OpenReview):** https://openreview.net/forum?id=UHW3PgLUsa
28
  - **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
29
 
30
  ---
@@ -34,7 +34,7 @@ The model is trained in two stages from a pretrained vanilla VAR checkpoint:
34
  | File | Purpose | Notes |
35
  |---|---|---|
36
  | `vae_ch160v4096z32.pth` | Frozen multi-scale VQ-VAE tokenizer | Inherited from VAR (depth-16, C_vae=32, V=4096, ch=160). Shared between SFT and GRPO. |
37
- | `StyleVAR_SFT.pth` | Stage 1 supervised fine-tuning checkpoint | Plain state dict no LoRA, no optimizer state. Use this for the SFT baseline. |
38
  | `StyleVAR-GRPO.pth` | Stage 2 GRPO-refined checkpoint | GRPO LoRA deltas already merged into the base weights. Drop-in replacement for `StyleVAR_SFT.pth`. |
39
 
40
  Both transformer checkpoints ship as plain state dicts under the `"model"` key — LoRA adapters have been baked in, so you can load them directly into a fresh StyleVAR without constructing any LoRA wrappers.
 
24
  - **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters (rank r=256) and Per-Action Normalization Weighting (PANW) to rebalance credit across the 256× token imbalance between coarse and fine scales.
25
 
26
  - **Code:** https://github.com/Senfier-LiqiJing/StyleVAR
27
+ - **Paper (arXiv):** https://arxiv.org/abs/2604.21052
28
  - **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
29
 
30
  ---
 
34
  | File | Purpose | Notes |
35
  |---|---|---|
36
  | `vae_ch160v4096z32.pth` | Frozen multi-scale VQ-VAE tokenizer | Inherited from VAR (depth-16, C_vae=32, V=4096, ch=160). Shared between SFT and GRPO. |
37
+ | `StyleVAR_SFT.pth` | Stage 1 supervised fine-tuning checkpoint | State dict with optimizer state. Use this for the SFT baseline. |
38
  | `StyleVAR-GRPO.pth` | Stage 2 GRPO-refined checkpoint | GRPO LoRA deltas already merged into the base weights. Drop-in replacement for `StyleVAR_SFT.pth`. |
39
 
40
  Both transformer checkpoints ship as plain state dicts under the `"model"` key — LoRA adapters have been baked in, so you can load them directly into a fresh StyleVAR without constructing any LoRA wrappers.