Senfier-LiqiJing
/

StyleVAR

visual-autoregressive

reinforcement-learning

image-generation

Model card Files Files and versions

Senfier-LiqiJing commited on 16 days ago

Commit

5e8078c

·

verified ·

1 Parent(s): 9bfa7f8

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -23,7 +23,8 @@ The model is trained in two stages from a pretrained vanilla VAR checkpoint:
 - **Stage 1 — SFT.** Supervised fine-tuning on 267,710 paired (content, style, target) triplets.
 - **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters ($r{=}256$) and Per-Action Normalization Weighting (PANW) to rebalance credit across the $256\times$ token imbalance between coarse and fine scales.
-- **Paper / code:** https://github.com/Senfier-LiqiJing/StyleVAR
 - **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
 ---

 - **Stage 1 — SFT.** Supervised fine-tuning on 267,710 paired (content, style, target) triplets.
 - **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters ($r{=}256$) and Per-Action Normalization Weighting (PANW) to rebalance credit across the $256\times$ token imbalance between coarse and fine scales.
+- **Code:** https://github.com/Senfier-LiqiJing/StyleVAR
+- **Paper (OpenReview):** https://openreview.net/forum?id=UHW3PgLUsa
 - **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
 ---