File size: 7,768 Bytes
aae3c7d bde4d05 7c1abfa bde4d05 7c1abfa bde4d05 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | # OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning
<p align="center">
<a href="https://github.com/Longin-Yu/OmniAlpha"><img src="https://img.shields.io/badge/GitHub-OmniAlpha-181717.svg?logo=github" alt="GitHub"></a>
<a href="https://arxiv.org/abs/2511.20211"><img src="https://img.shields.io/badge/arXiv-2511.20211-b31b1b.svg" alt="arXiv"></a>
<a href="https://huggingface.co/Longin-Yu/OmniAlpha"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow" alt="Hugging Face"></a>
</p>
---
**This is the official repository for "[OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning](https://arxiv.org/abs/2511.20211)".**

---
## π Project Structure
```
.
βββ alpha/ # Core package
β βββ data.py # Dataset loading & preprocessing
β βββ args.py # Argument definitions
β βββ inplace.py # In-place operations
β βββ pipelines/ # Inference pipelines (Qwen-Image-Edit)
β βββ vae/ # AlphaVAE model & losses
β βββ grpo/ # GRPO (RL) training utilities
β βββ utils/ # Utility functions
βββ configs/ # Configuration files
β βββ datasets.*.jsonc # Dataset configurations
β βββ deepspeed/ # DeepSpeed configs (ZeRO-1/3)
β βββ experiments/ # VAE experiment configs
β βββ accelerate.yaml # Accelerate config
βββ scripts/ # Bash scripts for training/inference
β βββ train_qwen_image.sh # Single-node training (Accelerate)
β βββ train_qwen_image_torchrun.sh # Multi-node training (torchrun)
β βββ vae_convert.sh # VAE conversion script
β βββ vae_train.sh # VAE fine-tuning script
β βββ infer.sh # Inference script
β βββ demo.sh # Gradio demo script
β βββ rl/ # GRPO reinforcement learning scripts
βββ tasks/ # Python/Jupyter task scripts
β βββ diffusion/ # Diffusion training & inference
β βββ vae/ # VAE fine-tuning, conversion & inference
β βββ rl/ # GRPO RL training & preprocessing
β βββ demo/ # Gradio demo application
βββ pyproject.toml # Package definitions & dependencies
```
## π¦ Installation
### Step 1. Create a Conda Environment
```bash
conda create -n OmniAlpha python=3.10
conda activate OmniAlpha
```
### Step 2. Install OmniAlpha
First clone this repo and `cd OmniAlpha`. Then:
```bash
# Install OmniAlpha and all dependencies
pip install -e .
```
## βοΈ Environment Variables
All scripts use environment variables to specify model/data paths. Set these before running any script:
```bash
# Model paths
export PRETRAINED_MODEL="Qwen/Qwen-Image-Edit-2509" # HuggingFace model ID or local path
export VAE_MODEL_PATH="/path/to/vae/checkpoint" # Path to AlphaVAE checkpoint
export LORA_PATH="/path/to/lora/pytorch_lora_weights.safetensors" # Path to LoRA weights
# Data paths
export DATA_ROOT="/path/to/datasets" # Root directory for all datasets
```
If not set, scripts will fall back to placeholder paths and you will need to edit them manually.
## π Data Preparation
> Please refer to `configs/datasets.demo.jsonc` for dataset configuration examples.
> Each dataset entry consists of two required fields:
>
> * `data_path`: Path to the JSONL annotation file.
> * `image_dir`: Root directory for the dataset images.
### Dataset Format
The annotation file (`data_path`) should be a JSONL file with the following structure. Both `input_images` and `output_images` must be **relative paths** within `image_dir`:
```jsonl
{"id": "case_0", "prompt": "Vintage camera next to a brown glass bottle.", "input_images": ["images_512/case_0/base.png"], "output_images": ["images_512/case_0/00.png"]}
{"id": "case_1", "prompt": "A vintage-style globe with a map of North and South America, mounted on a black stand.;Antique key with ornate design, attached to a chain.", "input_images": ["images_512/case_1/base.png"], "output_images": ["images_512/case_1/00.png", "images_512/case_1/01.png"]}
...
```
### Dataset Configuration
Create a `.jsonc` config file under `configs/` to define datasets and splits:
```jsonc
{
"datasets": {
"my_dataset": {
"data_path": "/path/to/datasets/my_dataset/annotations.jsonl",
"image_dir": "/path/to/datasets/my_dataset"
}
},
"splits": {
"train": [{"dataset": "my_dataset", "ends": -50}],
"valid": [{"dataset": "my_dataset", "starts": -50}]
}
}
```
## π½ Model Download
[Pretrained model checkpoints are available on Hugging Face.](https://huggingface.co/Longin-Yu/OmniAlpha)
## π Inference
You can use the provided script to run inference with pretrained models.
1. **Configure**: Set environment variables (`PRETRAINED_MODEL`, `VAE_MODEL_PATH`, `LORA_PATH`) or edit `scripts/infer.sh` directly.
2. **Execute**:
```bash
bash scripts/infer.sh
```
## π¬ Demo
We provide a Gradio-based demo for interactive multi-task RGBA generation and editing.
### Supported Tasks
- `t2i` β Text-to-RGBA image generation
- `ObjectClear` β Object removal
- `automatting` β Automatic matting
- `refmatting` β Referential matting
- `layerdecompose` β Layer decomposition
### Execute
```bash
# Set model paths
export PRETRAINED_MODEL="Qwen/Qwen-Image-Edit-2509"
export VAE_MODEL_PATH="/path/to/models/OmniAlpha/rgba_vae"
export LORA_PATH="/path/to/models/OmniAlpha/lora/pytorch_lora_weights.safetensors"
# Launch demo
bash scripts/demo.sh
```
### Example Assets
Demo example images are placed in `tasks/demo/omnialpha/`.
## ποΈ Training
### AlphaVAE Fine-tuning
```bash
# Step 1: Convert the base VAE to RGBA format
bash scripts/vae_convert.sh
# Step 2: Fine-tune the AlphaVAE
bash scripts/vae_train.sh
```
### LoRA Training (Single-Node with Accelerate)
```bash
bash scripts/train_qwen_image.sh
```
### LoRA Training (Multi-Node with torchrun)
For distributed training across multiple nodes:
```bash
# Set distributed training variables
export MASTER_ADDR="your_master_ip"
export MASTER_PORT=29500
export NNODES=2
export NPROC_PER_NODE=8
export MACHINE_RANK=0 # 0 for master, 1 for worker, etc.
export VERSION="omnialpha" # Matches configs/datasets.<VERSION>.jsonc
bash scripts/train_qwen_image_torchrun.sh
```
### GRPO Reinforcement Learning
For RL-based fine-tuning:
```bash
# Run GRPO training
bash scripts/rl/train_grpo.sh
# Or for multi-node:
bash scripts/rl/train_grpo_torchrun.sh
```
## π Contact
Feel free to reach out via email at longinyh@gmail.com. You can also open an issue if you have ideas to share or would like to contribute data for training future models.
## Citation
```bibtex
@article{yu2025omnialpha0,
title = {OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning},
author = {Hao Yu and Jiabo Zhan and Zile Wang and Jinglin Wang and Huaisong Zhang and Hongyu Li and Xinrui Chen and Yongxian Wei and Chun Yuan},
year = {2025},
journal = {arXiv preprint arXiv: 2511.20211}
}
@misc{wang2025alphavaeunifiedendtoendrgba,
title={AlphaVAE: Unified End-to-End RGBA Image Reconstruction and Generation with Alpha-Aware Representation Learning},
author={Zile Wang and Hao Yu and Jiabo Zhan and Chun Yuan},
year={2025},
eprint={2507.09308},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.09308},
}
```
|