Diffusers
Safetensors
File size: 7,768 Bytes
aae3c7d
bde4d05
 
 
 
 
 
 
 
 
7c1abfa
bde4d05
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7c1abfa
bde4d05
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
# OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning

<p align="center">
  <a href="https://github.com/Longin-Yu/OmniAlpha"><img src="https://img.shields.io/badge/GitHub-OmniAlpha-181717.svg?logo=github" alt="GitHub"></a>
  <a href="https://arxiv.org/abs/2511.20211"><img src="https://img.shields.io/badge/arXiv-2511.20211-b31b1b.svg" alt="arXiv"></a>
  <a href="https://huggingface.co/Longin-Yu/OmniAlpha"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow" alt="Hugging Face"></a>
</p>

---

**This is the official repository for "[OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning](https://arxiv.org/abs/2511.20211)".**

![examples](assets/examples_01.png)

---

## πŸ“‚ Project Structure

```
.
β”œβ”€β”€ alpha/               # Core package
β”‚   β”œβ”€β”€ data.py          # Dataset loading & preprocessing
β”‚   β”œβ”€β”€ args.py          # Argument definitions
β”‚   β”œβ”€β”€ inplace.py       # In-place operations
β”‚   β”œβ”€β”€ pipelines/       # Inference pipelines (Qwen-Image-Edit)
β”‚   β”œβ”€β”€ vae/             # AlphaVAE model & losses
β”‚   β”œβ”€β”€ grpo/            # GRPO (RL) training utilities
β”‚   └── utils/           # Utility functions
β”œβ”€β”€ configs/             # Configuration files
β”‚   β”œβ”€β”€ datasets.*.jsonc # Dataset configurations
β”‚   β”œβ”€β”€ deepspeed/       # DeepSpeed configs (ZeRO-1/3)
β”‚   β”œβ”€β”€ experiments/     # VAE experiment configs
β”‚   └── accelerate.yaml  # Accelerate config
β”œβ”€β”€ scripts/             # Bash scripts for training/inference
β”‚   β”œβ”€β”€ train_qwen_image.sh          # Single-node training (Accelerate)
β”‚   β”œβ”€β”€ train_qwen_image_torchrun.sh # Multi-node training (torchrun)
β”‚   β”œβ”€β”€ vae_convert.sh   # VAE conversion script
β”‚   β”œβ”€β”€ vae_train.sh     # VAE fine-tuning script
β”‚   β”œβ”€β”€ infer.sh         # Inference script
β”‚   β”œβ”€β”€ demo.sh          # Gradio demo script
β”‚   └── rl/              # GRPO reinforcement learning scripts
β”œβ”€β”€ tasks/               # Python/Jupyter task scripts
β”‚   β”œβ”€β”€ diffusion/       # Diffusion training & inference
β”‚   β”œβ”€β”€ vae/             # VAE fine-tuning, conversion & inference
β”‚   β”œβ”€β”€ rl/              # GRPO RL training & preprocessing
β”‚   └── demo/            # Gradio demo application
└── pyproject.toml       # Package definitions & dependencies
```

## πŸ“¦ Installation

### Step 1. Create a Conda Environment

```bash
conda create -n OmniAlpha python=3.10
conda activate OmniAlpha
```

### Step 2. Install OmniAlpha

First clone this repo and `cd OmniAlpha`. Then:

```bash
# Install OmniAlpha and all dependencies
pip install -e .
```

## βš™οΈ Environment Variables

All scripts use environment variables to specify model/data paths. Set these before running any script:

```bash
# Model paths
export PRETRAINED_MODEL="Qwen/Qwen-Image-Edit-2509"  # HuggingFace model ID or local path
export VAE_MODEL_PATH="/path/to/vae/checkpoint"       # Path to AlphaVAE checkpoint
export LORA_PATH="/path/to/lora/pytorch_lora_weights.safetensors"  # Path to LoRA weights

# Data paths
export DATA_ROOT="/path/to/datasets"           # Root directory for all datasets
```

If not set, scripts will fall back to placeholder paths and you will need to edit them manually.

## πŸ“„ Data Preparation

> Please refer to `configs/datasets.demo.jsonc` for dataset configuration examples.
> Each dataset entry consists of two required fields:
>
>   * `data_path`: Path to the JSONL annotation file.
>   * `image_dir`: Root directory for the dataset images.

### Dataset Format

The annotation file (`data_path`) should be a JSONL file with the following structure. Both `input_images` and `output_images` must be **relative paths** within `image_dir`:

```jsonl
{"id": "case_0", "prompt": "Vintage camera next to a brown glass bottle.", "input_images": ["images_512/case_0/base.png"], "output_images": ["images_512/case_0/00.png"]}
{"id": "case_1", "prompt": "A vintage-style globe with a map of North and South America, mounted on a black stand.;Antique key with ornate design, attached to a chain.", "input_images": ["images_512/case_1/base.png"], "output_images": ["images_512/case_1/00.png", "images_512/case_1/01.png"]}
...
```

### Dataset Configuration

Create a `.jsonc` config file under `configs/` to define datasets and splits:

```jsonc
{
    "datasets": {
        "my_dataset": {
            "data_path": "/path/to/datasets/my_dataset/annotations.jsonl",
            "image_dir": "/path/to/datasets/my_dataset"
        }
    },
    "splits": {
        "train": [{"dataset": "my_dataset", "ends": -50}],
        "valid": [{"dataset": "my_dataset", "starts": -50}]
    }
}
```

## πŸ”½ Model Download

[Pretrained model checkpoints are available on Hugging Face.](https://huggingface.co/Longin-Yu/OmniAlpha)

## πŸš€ Inference

You can use the provided script to run inference with pretrained models.

1. **Configure**: Set environment variables (`PRETRAINED_MODEL`, `VAE_MODEL_PATH`, `LORA_PATH`) or edit `scripts/infer.sh` directly.
2. **Execute**:

    ```bash
    bash scripts/infer.sh
    ```

## 🎬 Demo

We provide a Gradio-based demo for interactive multi-task RGBA generation and editing.

### Supported Tasks

- `t2i` β€” Text-to-RGBA image generation
- `ObjectClear` β€” Object removal
- `automatting` β€” Automatic matting
- `refmatting` β€” Referential matting
- `layerdecompose` β€” Layer decomposition

### Execute

```bash
# Set model paths
export PRETRAINED_MODEL="Qwen/Qwen-Image-Edit-2509"
export VAE_MODEL_PATH="/path/to/models/OmniAlpha/rgba_vae"
export LORA_PATH="/path/to/models/OmniAlpha/lora/pytorch_lora_weights.safetensors"

# Launch demo
bash scripts/demo.sh
```

### Example Assets

Demo example images are placed in `tasks/demo/omnialpha/`.

## πŸ‹οΈ Training

### AlphaVAE Fine-tuning

```bash
# Step 1: Convert the base VAE to RGBA format
bash scripts/vae_convert.sh

# Step 2: Fine-tune the AlphaVAE
bash scripts/vae_train.sh
```

### LoRA Training (Single-Node with Accelerate)

```bash
bash scripts/train_qwen_image.sh
```

### LoRA Training (Multi-Node with torchrun)

For distributed training across multiple nodes:

```bash
# Set distributed training variables
export MASTER_ADDR="your_master_ip"
export MASTER_PORT=29500
export NNODES=2
export NPROC_PER_NODE=8
export MACHINE_RANK=0  # 0 for master, 1 for worker, etc.
export VERSION="omnialpha"  # Matches configs/datasets.<VERSION>.jsonc

bash scripts/train_qwen_image_torchrun.sh
```

### GRPO Reinforcement Learning

For RL-based fine-tuning:

```bash
# Run GRPO training
bash scripts/rl/train_grpo.sh
# Or for multi-node:
bash scripts/rl/train_grpo_torchrun.sh
```

## πŸ”— Contact

Feel free to reach out via email at longinyh@gmail.com. You can also open an issue if you have ideas to share or would like to contribute data for training future models.

## Citation

```bibtex
@article{yu2025omnialpha0,
  title   = {OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning},
  author  = {Hao Yu and Jiabo Zhan and Zile Wang and Jinglin Wang and Huaisong Zhang and Hongyu Li and Xinrui Chen and Yongxian Wei and Chun Yuan},
  year    = {2025},
  journal = {arXiv preprint arXiv: 2511.20211}
}
@misc{wang2025alphavaeunifiedendtoendrgba,
      title={AlphaVAE: Unified End-to-End RGBA Image Reconstruction and Generation with Alpha-Aware Representation Learning}, 
      author={Zile Wang and Hao Yu and Jiabo Zhan and Chun Yuan},
      year={2025},
      eprint={2507.09308},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.09308}, 
}
```