Senfier-LiqiJing commited on
Commit
5e8078c
·
verified ·
1 Parent(s): 9bfa7f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -23,7 +23,8 @@ The model is trained in two stages from a pretrained vanilla VAR checkpoint:
23
  - **Stage 1 — SFT.** Supervised fine-tuning on 267,710 paired (content, style, target) triplets.
24
  - **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters ($r{=}256$) and Per-Action Normalization Weighting (PANW) to rebalance credit across the $256\times$ token imbalance between coarse and fine scales.
25
 
26
- - **Paper / code:** https://github.com/Senfier-LiqiJing/StyleVAR
 
27
  - **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
28
 
29
  ---
 
23
  - **Stage 1 — SFT.** Supervised fine-tuning on 267,710 paired (content, style, target) triplets.
24
  - **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters ($r{=}256$) and Per-Action Normalization Weighting (PANW) to rebalance credit across the $256\times$ token imbalance between coarse and fine scales.
25
 
26
+ - **Code:** https://github.com/Senfier-LiqiJing/StyleVAR
27
+ - **Paper (OpenReview):** https://openreview.net/forum?id=UHW3PgLUsa
28
  - **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
29
 
30
  ---