Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,7 @@ The model is trained in two stages from a pretrained vanilla VAR checkpoint:
|
|
| 24 |
- **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters (rank r=256) and Per-Action Normalization Weighting (PANW) to rebalance credit across the 256× token imbalance between coarse and fine scales.
|
| 25 |
|
| 26 |
- **Code:** https://github.com/Senfier-LiqiJing/StyleVAR
|
| 27 |
-
- **Paper (
|
| 28 |
- **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
|
| 29 |
|
| 30 |
---
|
|
@@ -34,7 +34,7 @@ The model is trained in two stages from a pretrained vanilla VAR checkpoint:
|
|
| 34 |
| File | Purpose | Notes |
|
| 35 |
|---|---|---|
|
| 36 |
| `vae_ch160v4096z32.pth` | Frozen multi-scale VQ-VAE tokenizer | Inherited from VAR (depth-16, C_vae=32, V=4096, ch=160). Shared between SFT and GRPO. |
|
| 37 |
-
| `StyleVAR_SFT.pth` | Stage 1 supervised fine-tuning checkpoint |
|
| 38 |
| `StyleVAR-GRPO.pth` | Stage 2 GRPO-refined checkpoint | GRPO LoRA deltas already merged into the base weights. Drop-in replacement for `StyleVAR_SFT.pth`. |
|
| 39 |
|
| 40 |
Both transformer checkpoints ship as plain state dicts under the `"model"` key — LoRA adapters have been baked in, so you can load them directly into a fresh StyleVAR without constructing any LoRA wrappers.
|
|
|
|
| 24 |
- **Stage 2 — GRPO.** Reinforcement fine-tuning with Group Relative Policy Optimization against a DreamSim-based perceptual reward, using LoRA adapters (rank r=256) and Per-Action Normalization Weighting (PANW) to rebalance credit across the 256× token imbalance between coarse and fine scales.
|
| 25 |
|
| 26 |
- **Code:** https://github.com/Senfier-LiqiJing/StyleVAR
|
| 27 |
+
- **Paper (arXiv):** https://arxiv.org/abs/2604.21052
|
| 28 |
- **Authors:** Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu (Duke University)
|
| 29 |
|
| 30 |
---
|
|
|
|
| 34 |
| File | Purpose | Notes |
|
| 35 |
|---|---|---|
|
| 36 |
| `vae_ch160v4096z32.pth` | Frozen multi-scale VQ-VAE tokenizer | Inherited from VAR (depth-16, C_vae=32, V=4096, ch=160). Shared between SFT and GRPO. |
|
| 37 |
+
| `StyleVAR_SFT.pth` | Stage 1 supervised fine-tuning checkpoint | State dict with optimizer state. Use this for the SFT baseline. |
|
| 38 |
| `StyleVAR-GRPO.pth` | Stage 2 GRPO-refined checkpoint | GRPO LoRA deltas already merged into the base weights. Drop-in replacement for `StyleVAR_SFT.pth`. |
|
| 39 |
|
| 40 |
Both transformer checkpoints ship as plain state dicts under the `"model"` key — LoRA adapters have been baked in, so you can load them directly into a fresh StyleVAR without constructing any LoRA wrappers.
|