Diffusers
Safetensors
OmniAlpha / README.md
Longin-Yu's picture
Upload folder using huggingface_hub
7c1abfa verified
# OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning
<p align="center">
<a href="https://github.com/Longin-Yu/OmniAlpha"><img src="https://img.shields.io/badge/GitHub-OmniAlpha-181717.svg?logo=github" alt="GitHub"></a>
<a href="https://arxiv.org/abs/2511.20211"><img src="https://img.shields.io/badge/arXiv-2511.20211-b31b1b.svg" alt="arXiv"></a>
<a href="https://huggingface.co/Longin-Yu/OmniAlpha"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow" alt="Hugging Face"></a>
</p>
---
**This is the official repository for "[OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning](https://arxiv.org/abs/2511.20211)".**
![examples](assets/examples_01.png)
---
## πŸ“‚ Project Structure
```
.
β”œβ”€β”€ alpha/ # Core package
β”‚ β”œβ”€β”€ data.py # Dataset loading & preprocessing
β”‚ β”œβ”€β”€ args.py # Argument definitions
β”‚ β”œβ”€β”€ inplace.py # In-place operations
β”‚ β”œβ”€β”€ pipelines/ # Inference pipelines (Qwen-Image-Edit)
β”‚ β”œβ”€β”€ vae/ # AlphaVAE model & losses
β”‚ β”œβ”€β”€ grpo/ # GRPO (RL) training utilities
β”‚ └── utils/ # Utility functions
β”œβ”€β”€ configs/ # Configuration files
β”‚ β”œβ”€β”€ datasets.*.jsonc # Dataset configurations
β”‚ β”œβ”€β”€ deepspeed/ # DeepSpeed configs (ZeRO-1/3)
β”‚ β”œβ”€β”€ experiments/ # VAE experiment configs
β”‚ └── accelerate.yaml # Accelerate config
β”œβ”€β”€ scripts/ # Bash scripts for training/inference
β”‚ β”œβ”€β”€ train_qwen_image.sh # Single-node training (Accelerate)
β”‚ β”œβ”€β”€ train_qwen_image_torchrun.sh # Multi-node training (torchrun)
β”‚ β”œβ”€β”€ vae_convert.sh # VAE conversion script
β”‚ β”œβ”€β”€ vae_train.sh # VAE fine-tuning script
β”‚ β”œβ”€β”€ infer.sh # Inference script
β”‚ β”œβ”€β”€ demo.sh # Gradio demo script
β”‚ └── rl/ # GRPO reinforcement learning scripts
β”œβ”€β”€ tasks/ # Python/Jupyter task scripts
β”‚ β”œβ”€β”€ diffusion/ # Diffusion training & inference
β”‚ β”œβ”€β”€ vae/ # VAE fine-tuning, conversion & inference
β”‚ β”œβ”€β”€ rl/ # GRPO RL training & preprocessing
β”‚ └── demo/ # Gradio demo application
└── pyproject.toml # Package definitions & dependencies
```
## πŸ“¦ Installation
### Step 1. Create a Conda Environment
```bash
conda create -n OmniAlpha python=3.10
conda activate OmniAlpha
```
### Step 2. Install OmniAlpha
First clone this repo and `cd OmniAlpha`. Then:
```bash
# Install OmniAlpha and all dependencies
pip install -e .
```
## βš™οΈ Environment Variables
All scripts use environment variables to specify model/data paths. Set these before running any script:
```bash
# Model paths
export PRETRAINED_MODEL="Qwen/Qwen-Image-Edit-2509" # HuggingFace model ID or local path
export VAE_MODEL_PATH="/path/to/vae/checkpoint" # Path to AlphaVAE checkpoint
export LORA_PATH="/path/to/lora/pytorch_lora_weights.safetensors" # Path to LoRA weights
# Data paths
export DATA_ROOT="/path/to/datasets" # Root directory for all datasets
```
If not set, scripts will fall back to placeholder paths and you will need to edit them manually.
## πŸ“„ Data Preparation
> Please refer to `configs/datasets.demo.jsonc` for dataset configuration examples.
> Each dataset entry consists of two required fields:
>
> * `data_path`: Path to the JSONL annotation file.
> * `image_dir`: Root directory for the dataset images.
### Dataset Format
The annotation file (`data_path`) should be a JSONL file with the following structure. Both `input_images` and `output_images` must be **relative paths** within `image_dir`:
```jsonl
{"id": "case_0", "prompt": "Vintage camera next to a brown glass bottle.", "input_images": ["images_512/case_0/base.png"], "output_images": ["images_512/case_0/00.png"]}
{"id": "case_1", "prompt": "A vintage-style globe with a map of North and South America, mounted on a black stand.;Antique key with ornate design, attached to a chain.", "input_images": ["images_512/case_1/base.png"], "output_images": ["images_512/case_1/00.png", "images_512/case_1/01.png"]}
...
```
### Dataset Configuration
Create a `.jsonc` config file under `configs/` to define datasets and splits:
```jsonc
{
"datasets": {
"my_dataset": {
"data_path": "/path/to/datasets/my_dataset/annotations.jsonl",
"image_dir": "/path/to/datasets/my_dataset"
}
},
"splits": {
"train": [{"dataset": "my_dataset", "ends": -50}],
"valid": [{"dataset": "my_dataset", "starts": -50}]
}
}
```
## πŸ”½ Model Download
[Pretrained model checkpoints are available on Hugging Face.](https://huggingface.co/Longin-Yu/OmniAlpha)
## πŸš€ Inference
You can use the provided script to run inference with pretrained models.
1. **Configure**: Set environment variables (`PRETRAINED_MODEL`, `VAE_MODEL_PATH`, `LORA_PATH`) or edit `scripts/infer.sh` directly.
2. **Execute**:
```bash
bash scripts/infer.sh
```
## 🎬 Demo
We provide a Gradio-based demo for interactive multi-task RGBA generation and editing.
### Supported Tasks
- `t2i` β€” Text-to-RGBA image generation
- `ObjectClear` β€” Object removal
- `automatting` β€” Automatic matting
- `refmatting` β€” Referential matting
- `layerdecompose` β€” Layer decomposition
### Execute
```bash
# Set model paths
export PRETRAINED_MODEL="Qwen/Qwen-Image-Edit-2509"
export VAE_MODEL_PATH="/path/to/models/OmniAlpha/rgba_vae"
export LORA_PATH="/path/to/models/OmniAlpha/lora/pytorch_lora_weights.safetensors"
# Launch demo
bash scripts/demo.sh
```
### Example Assets
Demo example images are placed in `tasks/demo/omnialpha/`.
## πŸ‹οΈ Training
### AlphaVAE Fine-tuning
```bash
# Step 1: Convert the base VAE to RGBA format
bash scripts/vae_convert.sh
# Step 2: Fine-tune the AlphaVAE
bash scripts/vae_train.sh
```
### LoRA Training (Single-Node with Accelerate)
```bash
bash scripts/train_qwen_image.sh
```
### LoRA Training (Multi-Node with torchrun)
For distributed training across multiple nodes:
```bash
# Set distributed training variables
export MASTER_ADDR="your_master_ip"
export MASTER_PORT=29500
export NNODES=2
export NPROC_PER_NODE=8
export MACHINE_RANK=0 # 0 for master, 1 for worker, etc.
export VERSION="omnialpha" # Matches configs/datasets.<VERSION>.jsonc
bash scripts/train_qwen_image_torchrun.sh
```
### GRPO Reinforcement Learning
For RL-based fine-tuning:
```bash
# Run GRPO training
bash scripts/rl/train_grpo.sh
# Or for multi-node:
bash scripts/rl/train_grpo_torchrun.sh
```
## πŸ”— Contact
Feel free to reach out via email at longinyh@gmail.com. You can also open an issue if you have ideas to share or would like to contribute data for training future models.
## Citation
```bibtex
@article{yu2025omnialpha0,
title = {OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning},
author = {Hao Yu and Jiabo Zhan and Zile Wang and Jinglin Wang and Huaisong Zhang and Hongyu Li and Xinrui Chen and Yongxian Wei and Chun Yuan},
year = {2025},
journal = {arXiv preprint arXiv: 2511.20211}
}
@misc{wang2025alphavaeunifiedendtoendrgba,
title={AlphaVAE: Unified End-to-End RGBA Image Reconstruction and Generation with Alpha-Aware Representation Learning},
author={Zile Wang and Hao Yu and Jiabo Zhan and Chun Yuan},
year={2025},
eprint={2507.09308},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.09308},
}
```