--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: image-text-to-text base_model: Qwen/Qwen3-VL-8B-Instruct tags: - agent - image-generation - tool-use - visual-reasoning - self-distillation - grpo - reinforcement-learning - multimodal - qwen3-vl datasets: - MeiGen-AI/GenEvolve-Data-Bench ---
GenEvolve

GenEvolve

Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Paper Project Page Code Dataset

This repository hosts the **GenEvolve agent policy** โ€” a Qwen3-VL-8B-Instruct backbone fine-tuned and self-evolved into a tool-orchestrated image-generation agent. Given a user request, the agent issues web/image searches, retrieves visual references, activates internal generation knowledge, and emits an executable **prompt-reference program** `z = (gen_prompt, reference_images)` that drives any reference-conditioned downstream generator (Qwen-Image-Edit, Nano Banana Pro, ...).
GenEvolve teaser

The same trained agent policy paired with two reference-conditioned generators โŸถ
Qwen-Image-Edit (open)  ยท  Nano Banana Pro (strong)

--- ## โœจ Highlights - **Tool-orchestrated trajectories.** The agent calls `search`, `image_search`, and `query_knowledge` (8 callable generation skills) before producing a final program `z = (gen_prompt, reference_images)`. - **Self-evolution with Visual Experience Distillation.** Best-vs-worst trajectory pairs are distilled token-level into the deployed student. **No runtime memory at inference.** - **Generator-transferable.** The same trained policy works with both an open-source generator (Qwen-Image-Edit-2511) and a strong proprietary generator (Nano Banana Pro). ## ๐Ÿ“Š Headline Results ### GenEvolve-Bench (KScore, held-out split) | Method | Generator | KScore | Knowledge-Anch. | Quality-Anch. | |---|---|---:|---:|---:| | Qwen-Image (raw) | Qwen-Image | 0.2987 | 0.2384 | 0.3768 | | Nano Banana Pro (raw) | Nano Banana Pro | 0.5298 | 0.5160 | 0.5477 | | Gen-Searcher 8B | Qwen-Image-Edit-2511 | 0.3493 | 0.3293 | 0.3745 | | Gen-Searcher 8B | Nano Banana Pro | 0.5481 | 0.5472 | 0.5492 | | **GenEvolve (Ours)** | Qwen-Image-Edit-2511 | **0.3663** | **0.3410** | **0.3990** | | **GenEvolve (Ours)** | Nano Banana Pro | **0.5739** | **0.5669** | **0.5830** | ### WISE Benchmark (WiScore, six knowledge categories) | Model | Cultural | Time | Space | Biology | Physics | Chemistry | **Overall** | |---|---:|---:|---:|---:|---:|---:|---:| | GPT-4o | 0.81 | 0.71 | **0.89** | **0.83** | 0.79 | 0.74 | 0.80 | | Gen-Searcher-8B + Qwen-Image | 0.80 | 0.71 | 0.82 | 0.76 | 0.74 | 0.75 | 0.77 | | Mind-Brush | 0.83 | 0.69 | 0.84 | 0.71 | **0.85** | 0.68 | 0.78 | | **GenEvolve + Qwen-Image-Edit** | **0.84** | 0.74 | 0.87 | **0.83** | 0.81 | **0.83** | **0.82** | --- ## ๐Ÿง  Method Overview

GenEvolve method overview

For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image. --- ## ๐Ÿ–ผ๏ธ Visual Demos

Qualitative comparison

Qualitative comparison on representative cases. Orange marks external/uncommon knowledge requirements; blue marks internal generation-knowledge requirements.

### ๐ŸŽจ Gallery โ€” paired with Nano Banana Pro

GenEvolve + Nano Banana Pro gallery

The same agent policy with Nano Banana Pro as the downstream renderer. Examples cover spatial layout, text rendering, quantity counting, attribute binding, anatomy/pose, creative transfer, material physics, and aesthetic drawing.

### ๐ŸŽจ Gallery โ€” paired with Qwen-Image-Edit (open)

GenEvolve + Qwen-Image-Edit gallery

Same trained policy paired with the open-source Qwen-Image-Edit-2511 renderer; consistent quality across both generators reflects generator-transferable orchestration.

--- ## ๐Ÿš€ Quick Start The deployed checkpoint is the **student policy** โ€” it consumes a user prompt and returns a JSON `gen_prompt + reference_images` program through a `//` loop. The end-to-end runtime (vLLM serving + agent loop + tools + Qwen/Nano renderers) lives in the [GitHub repo](https://github.com/MeiGen-AI/GenEvolve); the snippet below mirrors its installation and usage. ### 1. Install the main GenEvolve runtime ```bash git clone https://github.com/MeiGen-AI/GenEvolve.git cd GenEvolve conda create -n genevolve python=3.11 -y && conda activate genevolve pip install -U pip setuptools wheel packaging psutil ninja pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128 pip install --no-build-isolation -r requirements.txt pip install -e . ``` Qwen-Image-Edit rendering runs as a **separate FastAPI service** (kept out of the vLLM environment to avoid CUDA/diffusers conflicts). Set up that service from the GitHub README when you want to use `--backend qwen-image-edit-service`. ### 2. Serve the agent policy ```bash # Single GPU / single replica. MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=1 bash scripts/serve_vllm.sh # Higher throughput on one 8-GPU node (8 replicas, 1 GPU each). MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh ``` `TP` shards one model replica across multiple GPUs; `DP` launches multiple replicas; total GPU usage is `TP ร— DP`. ### 3. End-to-end example ```bash export SERPER_API_KEY= # required for search / image_search export GOOGLE_API_KEY= # or GEMINI_API_KEY; only for --backend nano-banana-pro # Nano Banana Pro renderer python examples/quickstart.py \ --backend nano-banana-pro \ --base-url http://localhost:8000/v1 \ --model GenEvolve \ --prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" in bold serif." \ --output paris.png # Qwen-Image-Edit renderer (point at your Qwen-Image-Edit FastAPI service) python examples/quickstart.py \ --backend qwen-image-edit-service \ --service-url http://your-qwen-service:8001 \ --base-url http://localhost:8000/v1 \ --model GenEvolve \ --output paris_qwen.png ``` The agent's final `` is a JSON object: ```json { "gen_prompt": "...natural-language prompt that refers to images by 'the first reference image', ...", "reference_images": [ {"img_id": "IMG_001", "note": "what to copy from this image"} ] } ``` `gen_prompt` MUST refer to selected images using ordinal phrases (`"the first reference image"`) โ€” never raw `IMG_###` ids or URLs. Pass `(gen_prompt, [r["local_path"] for r in reference_images])` to your favourite reference-conditioned generator (Qwen-Image-Edit, Nano Banana Pro, ...) to obtain the final image. --- ## ๐Ÿ—‚๏ธ Related Artifacts | Artifact | Link | |---|---| | Project page | https://ephemeral182.github.io/GenEvolve/ | | Paper | Coming soon | | Code | https://github.com/MeiGen-AI/GenEvolve | | Training data + benchmark | [MeiGen-AI/GenEvolve-Data-Bench](https://huggingface.co/datasets/MeiGen-AI/GenEvolve-Data-Bench) | | Base model | [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) | --- ## โš–๏ธ Intended Use, Limits, Bias - **Intended use.** Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes. - **Search dependency.** The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in. - **Bias.** Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs. --- ## ๐Ÿ“‘ Citation ```bibtex @misc{chen2026genevolveselfevolvingimagegeneration, title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation}, author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu}, year={2026}, eprint={2605.21605}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2605.21605}, } ```