Update README.md
Browse files
README.md
CHANGED
|
@@ -23,7 +23,8 @@ The model is trained in two stages from a pretrained vanilla VAR checkpoint:
|
|
| 23 |
- **Stage 1 — SFT.** Supervised fine-tuning on 267,710 paired (content, style, target) triplets.
|
| 24 |
- **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters ($r{=}256$) and Per-Action Normalization Weighting (PANW) to rebalance credit across the $256\times$ token imbalance between coarse and fine scales.
|
| 25 |
|
| 26 |
-
- **
|
|
|
|
| 27 |
- **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
|
| 28 |
|
| 29 |
---
|
|
|
|
| 23 |
- **Stage 1 — SFT.** Supervised fine-tuning on 267,710 paired (content, style, target) triplets.
|
| 24 |
- **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters ($r{=}256$) and Per-Action Normalization Weighting (PANW) to rebalance credit across the $256\times$ token imbalance between coarse and fine scales.
|
| 25 |
|
| 26 |
+
- **Code:** https://github.com/Senfier-LiqiJing/StyleVAR
|
| 27 |
+
- **Paper (OpenReview):** https://openreview.net/forum?id=UHW3PgLUsa
|
| 28 |
- **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
|
| 29 |
|
| 30 |
---
|