| # OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning |
|
|
| <p align="center"> |
| <a href="https://github.com/Longin-Yu/OmniAlpha"><img src="https://img.shields.io/badge/GitHub-OmniAlpha-181717.svg?logo=github" alt="GitHub"></a> |
| <a href="https://arxiv.org/abs/2511.20211"><img src="https://img.shields.io/badge/arXiv-2511.20211-b31b1b.svg" alt="arXiv"></a> |
| <a href="https://huggingface.co/Longin-Yu/OmniAlpha"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow" alt="Hugging Face"></a> |
| </p> |
|
|
| --- |
|
|
| **This is the official repository for "[OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning](https://arxiv.org/abs/2511.20211)".** |
|
|
|  |
|
|
| --- |
|
|
| ## π Project Structure |
|
|
| ``` |
| . |
| βββ alpha/ # Core package |
| β βββ data.py # Dataset loading & preprocessing |
| β βββ args.py # Argument definitions |
| β βββ inplace.py # In-place operations |
| β βββ pipelines/ # Inference pipelines (Qwen-Image-Edit) |
| β βββ vae/ # AlphaVAE model & losses |
| β βββ grpo/ # GRPO (RL) training utilities |
| β βββ utils/ # Utility functions |
| βββ configs/ # Configuration files |
| β βββ datasets.*.jsonc # Dataset configurations |
| β βββ deepspeed/ # DeepSpeed configs (ZeRO-1/3) |
| β βββ experiments/ # VAE experiment configs |
| β βββ accelerate.yaml # Accelerate config |
| βββ scripts/ # Bash scripts for training/inference |
| β βββ train_qwen_image.sh # Single-node training (Accelerate) |
| β βββ train_qwen_image_torchrun.sh # Multi-node training (torchrun) |
| β βββ vae_convert.sh # VAE conversion script |
| β βββ vae_train.sh # VAE fine-tuning script |
| β βββ infer.sh # Inference script |
| β βββ demo.sh # Gradio demo script |
| β βββ rl/ # GRPO reinforcement learning scripts |
| βββ tasks/ # Python/Jupyter task scripts |
| β βββ diffusion/ # Diffusion training & inference |
| β βββ vae/ # VAE fine-tuning, conversion & inference |
| β βββ rl/ # GRPO RL training & preprocessing |
| β βββ demo/ # Gradio demo application |
| βββ pyproject.toml # Package definitions & dependencies |
| ``` |
|
|
| ## π¦ Installation |
|
|
| ### Step 1. Create a Conda Environment |
|
|
| ```bash |
| conda create -n OmniAlpha python=3.10 |
| conda activate OmniAlpha |
| ``` |
|
|
| ### Step 2. Install OmniAlpha |
|
|
| First clone this repo and `cd OmniAlpha`. Then: |
|
|
| ```bash |
| # Install OmniAlpha and all dependencies |
| pip install -e . |
| ``` |
|
|
| ## βοΈ Environment Variables |
|
|
| All scripts use environment variables to specify model/data paths. Set these before running any script: |
|
|
| ```bash |
| # Model paths |
| export PRETRAINED_MODEL="Qwen/Qwen-Image-Edit-2509" # HuggingFace model ID or local path |
| export VAE_MODEL_PATH="/path/to/vae/checkpoint" # Path to AlphaVAE checkpoint |
| export LORA_PATH="/path/to/lora/pytorch_lora_weights.safetensors" # Path to LoRA weights |
| |
| # Data paths |
| export DATA_ROOT="/path/to/datasets" # Root directory for all datasets |
| ``` |
|
|
| If not set, scripts will fall back to placeholder paths and you will need to edit them manually. |
|
|
| ## π Data Preparation |
|
|
| > Please refer to `configs/datasets.demo.jsonc` for dataset configuration examples. |
| > Each dataset entry consists of two required fields: |
| > |
| > * `data_path`: Path to the JSONL annotation file. |
| > * `image_dir`: Root directory for the dataset images. |
|
|
| ### Dataset Format |
|
|
| The annotation file (`data_path`) should be a JSONL file with the following structure. Both `input_images` and `output_images` must be **relative paths** within `image_dir`: |
|
|
| ```jsonl |
| {"id": "case_0", "prompt": "Vintage camera next to a brown glass bottle.", "input_images": ["images_512/case_0/base.png"], "output_images": ["images_512/case_0/00.png"]} |
| {"id": "case_1", "prompt": "A vintage-style globe with a map of North and South America, mounted on a black stand.;Antique key with ornate design, attached to a chain.", "input_images": ["images_512/case_1/base.png"], "output_images": ["images_512/case_1/00.png", "images_512/case_1/01.png"]} |
| ... |
| ``` |
|
|
| ### Dataset Configuration |
|
|
| Create a `.jsonc` config file under `configs/` to define datasets and splits: |
|
|
| ```jsonc |
| { |
| "datasets": { |
| "my_dataset": { |
| "data_path": "/path/to/datasets/my_dataset/annotations.jsonl", |
| "image_dir": "/path/to/datasets/my_dataset" |
| } |
| }, |
| "splits": { |
| "train": [{"dataset": "my_dataset", "ends": -50}], |
| "valid": [{"dataset": "my_dataset", "starts": -50}] |
| } |
| } |
| ``` |
|
|
| ## π½ Model Download |
|
|
| [Pretrained model checkpoints are available on Hugging Face.](https://huggingface.co/Longin-Yu/OmniAlpha) |
|
|
| ## π Inference |
|
|
| You can use the provided script to run inference with pretrained models. |
|
|
| 1. **Configure**: Set environment variables (`PRETRAINED_MODEL`, `VAE_MODEL_PATH`, `LORA_PATH`) or edit `scripts/infer.sh` directly. |
| 2. **Execute**: |
|
|
| ```bash |
| bash scripts/infer.sh |
| ``` |
| |
| ## π¬ Demo |
|
|
| We provide a Gradio-based demo for interactive multi-task RGBA generation and editing. |
|
|
| ### Supported Tasks |
|
|
| - `t2i` β Text-to-RGBA image generation |
| - `ObjectClear` β Object removal |
| - `automatting` β Automatic matting |
| - `refmatting` β Referential matting |
| - `layerdecompose` β Layer decomposition |
|
|
| ### Execute |
|
|
| ```bash |
| # Set model paths |
| export PRETRAINED_MODEL="Qwen/Qwen-Image-Edit-2509" |
| export VAE_MODEL_PATH="/path/to/models/OmniAlpha/rgba_vae" |
| export LORA_PATH="/path/to/models/OmniAlpha/lora/pytorch_lora_weights.safetensors" |
| |
| # Launch demo |
| bash scripts/demo.sh |
| ``` |
|
|
| ### Example Assets |
|
|
| Demo example images are placed in `tasks/demo/omnialpha/`. |
|
|
| ## ποΈ Training |
|
|
| ### AlphaVAE Fine-tuning |
|
|
| ```bash |
| # Step 1: Convert the base VAE to RGBA format |
| bash scripts/vae_convert.sh |
| |
| # Step 2: Fine-tune the AlphaVAE |
| bash scripts/vae_train.sh |
| ``` |
|
|
| ### LoRA Training (Single-Node with Accelerate) |
|
|
| ```bash |
| bash scripts/train_qwen_image.sh |
| ``` |
|
|
| ### LoRA Training (Multi-Node with torchrun) |
|
|
| For distributed training across multiple nodes: |
|
|
| ```bash |
| # Set distributed training variables |
| export MASTER_ADDR="your_master_ip" |
| export MASTER_PORT=29500 |
| export NNODES=2 |
| export NPROC_PER_NODE=8 |
| export MACHINE_RANK=0 # 0 for master, 1 for worker, etc. |
| export VERSION="omnialpha" # Matches configs/datasets.<VERSION>.jsonc |
| |
| bash scripts/train_qwen_image_torchrun.sh |
| ``` |
|
|
| ### GRPO Reinforcement Learning |
|
|
| For RL-based fine-tuning: |
|
|
| ```bash |
| # Run GRPO training |
| bash scripts/rl/train_grpo.sh |
| # Or for multi-node: |
| bash scripts/rl/train_grpo_torchrun.sh |
| ``` |
|
|
| ## π Contact |
|
|
| Feel free to reach out via email at longinyh@gmail.com. You can also open an issue if you have ideas to share or would like to contribute data for training future models. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{yu2025omnialpha0, |
| title = {OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning}, |
| author = {Hao Yu and Jiabo Zhan and Zile Wang and Jinglin Wang and Huaisong Zhang and Hongyu Li and Xinrui Chen and Yongxian Wei and Chun Yuan}, |
| year = {2025}, |
| journal = {arXiv preprint arXiv: 2511.20211} |
| } |
| @misc{wang2025alphavaeunifiedendtoendrgba, |
| title={AlphaVAE: Unified End-to-End RGBA Image Reconstruction and Generation with Alpha-Aware Representation Learning}, |
| author={Zile Wang and Hao Yu and Jiabo Zhan and Chun Yuan}, |
| year={2025}, |
| eprint={2507.09308}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV}, |
| url={https://arxiv.org/abs/2507.09308}, |
| } |
| ``` |
|
|