Instructions to use MeiGen-AI/GenEvolve with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MeiGen-AI/GenEvolve with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="MeiGen-AI/GenEvolve") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("MeiGen-AI/GenEvolve") model = AutoModelForImageTextToText.from_pretrained("MeiGen-AI/GenEvolve") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MeiGen-AI/GenEvolve with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MeiGen-AI/GenEvolve" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MeiGen-AI/GenEvolve", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/MeiGen-AI/GenEvolve
- SGLang
How to use MeiGen-AI/GenEvolve with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MeiGen-AI/GenEvolve" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MeiGen-AI/GenEvolve", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MeiGen-AI/GenEvolve" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MeiGen-AI/GenEvolve", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use MeiGen-AI/GenEvolve with Docker Model Runner:
docker model run hf.co/MeiGen-AI/GenEvolve
GenEvolve
Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation
This repository hosts the GenEvolve agent policy — a Qwen3-VL-8B-Instruct backbone fine-tuned and self-evolved into a tool-orchestrated image-generation agent. Given a user request, the agent issues web/image searches, retrieves visual references, activates internal generation knowledge, and emits an executable prompt-reference program z = (gen_prompt, reference_images) that drives any reference-conditioned downstream generator (Qwen-Image-Edit, Nano Banana Pro, ...).
The same trained agent policy paired with two reference-conditioned generators ⟶
Qwen-Image-Edit (open) · Nano Banana Pro (strong)
✨ Highlights
- Tool-orchestrated trajectories. The agent calls
search,image_search, andquery_knowledge(8 callable generation skills) before producing a final programz = (gen_prompt, reference_images). - Self-evolution with Visual Experience Distillation. Best-vs-worst trajectory pairs are distilled token-level into the deployed student. No runtime memory at inference.
- Generator-transferable. The same trained policy works with both an open-source generator (Qwen-Image-Edit-2511) and a strong proprietary generator (Nano Banana Pro).
📊 Headline Results
GenEvolve-Bench (KScore, held-out split)
| Method | Generator | KScore | Knowledge-Anch. | Quality-Anch. |
|---|---|---|---|---|
| Qwen-Image (raw) | Qwen-Image | 0.2987 | 0.2384 | 0.3768 |
| Nano Banana Pro (raw) | Nano Banana Pro | 0.5298 | 0.5160 | 0.5477 |
| Gen-Searcher 8B | Qwen-Image-Edit-2511 | 0.3493 | 0.3293 | 0.3745 |
| Gen-Searcher 8B | Nano Banana Pro | 0.5481 | 0.5472 | 0.5492 |
| GenEvolve (Ours) | Qwen-Image-Edit-2511 | 0.3663 | 0.3410 | 0.3990 |
| GenEvolve (Ours) | Nano Banana Pro | 0.5739 | 0.5669 | 0.5830 |
WISE Benchmark (WiScore, six knowledge categories)
| Model | Cultural | Time | Space | Biology | Physics | Chemistry | Overall |
|---|---|---|---|---|---|---|---|
| GPT-4o | 0.81 | 0.71 | 0.89 | 0.83 | 0.79 | 0.74 | 0.80 |
| Gen-Searcher-8B + Qwen-Image | 0.80 | 0.71 | 0.82 | 0.76 | 0.74 | 0.75 | 0.77 |
| Mind-Brush | 0.83 | 0.69 | 0.84 | 0.71 | 0.85 | 0.68 | 0.78 |
| GenEvolve + Qwen-Image-Edit | 0.84 | 0.74 | 0.87 | 0.83 | 0.81 | 0.83 | 0.82 |
🧠 Method Overview

For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image.
🖼️ Visual Demos

Qualitative comparison on representative cases. Orange marks external/uncommon knowledge requirements; blue marks internal generation-knowledge requirements.
🎨 Gallery — paired with Nano Banana Pro

The same agent policy with Nano Banana Pro as the downstream renderer. Examples cover spatial layout, text rendering, quantity counting, attribute binding, anatomy/pose, creative transfer, material physics, and aesthetic drawing.
🎨 Gallery — paired with Qwen-Image-Edit (open)

Same trained policy paired with the open-source Qwen-Image-Edit-2511 renderer; consistent quality across both generators reflects generator-transferable orchestration.
🚀 Quick Start
The deployed checkpoint is the student policy — it consumes a user prompt and returns a JSON gen_prompt + reference_images program through a <think>/<tool_call>/<answer> loop. The end-to-end runtime (vLLM serving + agent loop + tools + Qwen/Nano renderers) lives in the GitHub repo; the snippet below mirrors its installation and usage.
1. Install the main GenEvolve runtime
git clone https://github.com/MeiGen-AI/GenEvolve.git
cd GenEvolve
conda create -n genevolve python=3.11 -y && conda activate genevolve
pip install -U pip setuptools wheel packaging psutil ninja
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install --no-build-isolation -r requirements.txt
pip install -e .
Qwen-Image-Edit rendering runs as a separate FastAPI service (kept out of the vLLM environment to avoid CUDA/diffusers conflicts). Set up that service from the GitHub README when you want to use --backend qwen-image-edit-service.
2. Serve the agent policy
# Single GPU / single replica.
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=1 bash scripts/serve_vllm.sh
# Higher throughput on one 8-GPU node (8 replicas, 1 GPU each).
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh
TP shards one model replica across multiple GPUs; DP launches multiple replicas; total GPU usage is TP × DP.
3. End-to-end example
export SERPER_API_KEY=<your_key> # required for search / image_search
export GOOGLE_API_KEY=<your_key> # or GEMINI_API_KEY; only for --backend nano-banana-pro
# Nano Banana Pro renderer
python examples/quickstart.py \
--backend nano-banana-pro \
--base-url http://localhost:8000/v1 \
--model GenEvolve \
--prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" in bold serif." \
--output paris.png
# Qwen-Image-Edit renderer (point at your Qwen-Image-Edit FastAPI service)
python examples/quickstart.py \
--backend qwen-image-edit-service \
--service-url http://your-qwen-service:8001 \
--base-url http://localhost:8000/v1 \
--model GenEvolve \
--output paris_qwen.png
The agent's final <answer> is a JSON object:
{
"gen_prompt": "...natural-language prompt that refers to images by 'the first reference image', ...",
"reference_images": [
{"img_id": "IMG_001", "note": "what to copy from this image"}
]
}
gen_prompt MUST refer to selected images using ordinal phrases ("the first reference image") — never raw IMG_### ids or URLs. Pass (gen_prompt, [r["local_path"] for r in reference_images]) to your favourite reference-conditioned generator (Qwen-Image-Edit, Nano Banana Pro, ...) to obtain the final image.
🗂️ Related Artifacts
| Artifact | Link |
|---|---|
| Project page | https://ephemeral182.github.io/GenEvolve/ |
| Paper | Coming soon |
| Code | https://github.com/MeiGen-AI/GenEvolve |
| Training data + benchmark | MeiGen-AI/GenEvolve-Data-Bench |
| Base model | Qwen/Qwen3-VL-8B-Instruct |
⚖️ Intended Use, Limits, Bias
- Intended use. Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
- Search dependency. The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
- Bias. Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs.
📑 Citation
@misc{chen2026genevolveselfevolvingimagegeneration,
title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation},
author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu},
year={2026},
eprint={2605.21605},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.21605},
}
- Downloads last month
- 22
Model tree for MeiGen-AI/GenEvolve
Base model
Qwen/Qwen3-VL-8B-Instruct
docker model run hf.co/MeiGen-AI/GenEvolve