Instructions to use MeiGen-AI/GenEvolve with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MeiGen-AI/GenEvolve with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="MeiGen-AI/GenEvolve")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("MeiGen-AI/GenEvolve")
model = AutoModelForImageTextToText.from_pretrained("MeiGen-AI/GenEvolve")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MeiGen-AI/GenEvolve with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MeiGen-AI/GenEvolve"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MeiGen-AI/GenEvolve",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/MeiGen-AI/GenEvolve

SGLang

How to use MeiGen-AI/GenEvolve with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MeiGen-AI/GenEvolve" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MeiGen-AI/GenEvolve",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MeiGen-AI/GenEvolve" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MeiGen-AI/GenEvolve",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use MeiGen-AI/GenEvolve with Docker Model Runner:
```
docker model run hf.co/MeiGen-AI/GenEvolve
```

Ephemeral182 commited on 3 days ago

Commit

e61d1d5

verified ·

1 Parent(s): 22de288

Add model card README + figures

Browse files

Files changed (6) hide show

.gitattributes +4 -0
README.md +258 -0
assets/logo_genevolve.png +3 -0
assets/overview.png +3 -0
assets/teaser.jpg +3 -0
assets/visual_comparison.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/logo_genevolve.png filter=lfs diff=lfs merge=lfs -text
+assets/overview.png filter=lfs diff=lfs merge=lfs -text
+assets/teaser.jpg filter=lfs diff=lfs merge=lfs -text
+assets/visual_comparison.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,261 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- en
+library_name: transformers
+pipeline_tag: image-text-to-text
+base_model: Qwen/Qwen3-VL-8B-Instruct
+tags:
+- agent
+- image-generation
+- tool-use
+- visual-reasoning
+- self-distillation
+- grpo
+- reinforcement-learning
+- multimodal
+- qwen3-vl
+datasets:
+- MeiGen-AI/GenEvolve-Data
 ---
+<div align="center">
+<img src="assets/logo_genevolve.png" alt="GenEvolve" width="160">
+<h1>GenEvolve</h1>
+<p><strong><em>Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation</em></strong></p>
+<p>
+  <a href="https://ephemeral182.github.io/GenEvolve/">
+    <img alt="Project Page" src="https://img.shields.io/badge/🌐_Project-Page-1f6feb"></a>
+  <a href="https://arxiv.org/abs/XXXX.XXXXX">
+    <img alt="arXiv" src="https://img.shields.io/badge/📄_arXiv-XXXX.XXXXX-b31b1b"></a>
+  <a href="https://github.com/Ephemeral182/GenEvolve">
+    <img alt="Code" src="https://img.shields.io/badge/💾_GitHub-Code-181717"></a>
+  <a href="https://huggingface.co/datasets/MeiGen-AI/GenEvolve-Data">
+    <img alt="Dataset" src="https://img.shields.io/badge/🤗_Dataset-GenEvolve--Data-FFD21E"></a>
+</p>
+<p>
+  <img alt="python" src="https://img.shields.io/badge/python-3.11-3776AB?logo=python&logoColor=white">
+  <img alt="pytorch" src="https://img.shields.io/badge/pytorch-2.8-EE4C2C?logo=pytorch&logoColor=white">
+  <img alt="vllm" src="https://img.shields.io/badge/vLLM-0.11-30A14E">
+  <img alt="cuda" src="https://img.shields.io/badge/CUDA-12.x-76B900?logo=nvidia&logoColor=white">
+  <img alt="license" src="https://img.shields.io/badge/license-Apache%202.0-green">
+  <img alt="status" src="https://img.shields.io/badge/status-active-brightgreen">
+</p>
+</div>
+> **GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation**
+> Sixiang Chen, Zhaohu Xing, Tian Ye, Xinyu Geng, Yunlong Lin, Jianyu Lai, Xuanhua He, Fuxiang Zhai, Jialin Gao, Lei Zhu
+> *Submitted to NeurIPS 2026*
+This repository hosts the **GenEvolve agent policy** — a Qwen3-VL-8B-Instruct backbone fine-tuned and self-evolved into a tool-orchestrated image-generation agent. Given a user request, the agent issues web/image searches, retrieves visual references, activates internal generation knowledge, and emits an executable **prompt-reference program** `z = (gen_prompt, reference_images)` that drives any reference-conditioned downstream generator (Qwen-Image-Edit, Nano Banana Pro, ...).
+<div align="center">
+<img src="assets/teaser.jpg" alt="GenEvolve teaser" width="100%">
+<p><em>The same trained agent policy paired with two reference-conditioned generators ⟶<br>
+<strong>Qwen-Image-Edit (open)</strong> &nbsp;·&nbsp; <strong>Nano Banana Pro (strong)</strong></em></p>
+</div>
+---
+## ✨ TL;DR
+- **Tool-orchestrated trajectories.** The agent calls `search`, `image_search`, and `query_knowledge` (8 callable generation skills) before producing a final program `z = (gen_prompt, reference_images)`.
+- **Self-evolution = GRPO + Visual Experience Distillation.** Best-vs-worst trajectory pairs are summarized into a *decision guide* (retrieval-key + 6 imperative bullet lists). The teacher view sees the retrieved guide, the student does not — SDL distills the teacher's token-level preferences back into the deployed student. **No runtime memory at inference.**
+- **Generator-transferable.** The same trained policy improves both an open-source generator (Qwen-Image-Edit-2511, KScore 0.299 → 0.366) and a strong proprietary generator (Nano Banana Pro, 0.530 → **0.574**).
+- **Strong external generalization.** Achieves **0.82** WiScore on the WISE knowledge-intensive benchmark, beating GPT-4o (0.80) and all agentic baselines.
+---
+## 📊 Headline Results
+### GenEvolve-Bench (KScore on the held-out split)
+| Method | Generator | KScore | Knowledge-Anch. | Quality-Anch. |
+|---|---|---:|---:|---:|
+| Qwen-Image (raw) | Qwen-Image | 0.2987 | 0.2384 | 0.3768 |
+| Nano Banana Pro (raw) | Nano Banana Pro | 0.5298 | 0.5160 | 0.5477 |
+| Gen-Searcher 8B | Qwen-Image-Edit-2511 | 0.3493 | 0.3293 | 0.3745 |
+| Gen-Searcher 8B | Nano Banana Pro | 0.5481 | 0.5472 | 0.5492 |
+| **GenEvolve (Ours)** | Qwen-Image-Edit-2511 | **0.3663** | **0.3410** | **0.3990** |
+| **GenEvolve (Ours)** | Nano Banana Pro | **0.5739** | **0.5669** | **0.5830** |
+### WISE Benchmark (WiScore, six knowledge categories)
+| Model | Cultural | Time | Space | Biology | Physics | Chemistry | **Overall** |
+|---|---:|---:|---:|---:|---:|---:|---:|
+| GPT-4o | 0.81 | 0.71 | **0.89** | **0.83** | 0.79 | 0.74 | 0.80 |
+| Gen-Searcher-8B + Qwen-Image | 0.80 | 0.71 | 0.82 | 0.76 | 0.74 | 0.75 | 0.77 |
+| Mind-Brush | 0.83 | 0.69 | 0.84 | 0.71 | **0.85** | 0.68 | 0.78 |
+| **GenEvolve + Qwen-Image-Edit** | **0.84** | 0.74 | 0.87 | **0.83** | 0.81 | **0.83** | **0.82** |
+<div align="center">
+<img src="assets/visual_comparison.png" alt="Visual comparison vs strong baselines" width="100%">
+<p><em>Visual comparison on representative GenEvolve-Bench cases; <span style="color:#ea580c">orange</span> marks external/uncommon knowledge; <span style="color:#1f6feb">blue</span> marks internal generation-knowledge requirements.</em></p>
+</div>
+---
+## 🧠 Method Overview
+<p align="center"><img src="assets/overview.png" alt="GenEvolve method overview" width="92%"></p>
+For a user request $x$, the agent samples a multi-turn trajectory
+$$\tau = (a_1, o_1, \ldots, a_T, o_T, z), \qquad z = (g, R),$$
+where each $a_t$ is one of three actions and $o_t$ is the corresponding observation. The downstream generator renders $\hat{y} = G(g, R)$.
+| Tool | Role | Output |
+|---|---|---|
+| `search(queries)` | External textual evidence — entities, dates, facts | Markdown digest |
+| `image_search(query)` | Visual references; each result is given a unique `IMG_###` id | Image list with local paths |
+| `query_knowledge(skill_name)` | Internal generation knowledge — `spatial_layout`, `text_rendering`, `quantity_counting`, `attribute_binding`, `anatomy_body_coherence`, `physical_material_consistency`, `creative_drawing`, `aesthetic_drawing` | Skill markdown |
+**Self-evolution (training-only).** For each prompt the agent samples 6 rollouts. The best/worst pair (with a sufficient reward gap) is summarized by a Gemini-3.1-Pro judge into a single bundle:
+```
+retrieval_key:    { trigger, source_prompt_summary }
+decision_guidance:
+    decision_focus
+    recommended_tool_plan      (1–4 imperative bullets)
+    search_query_guidance      (1–3 bullets)
+    skill_routing_guidance     (1–4 bullets)
+    reference_selection_guidance (1–3 bullets)
+    prompt_program_guidance    (1–3 bullets)
+    failure_guards             (1–3 bullets)
+```
+Bundles are stored in a 500-entry rolling buffer keyed by `embed(trigger + source_prompt_summary)` (Qwen3-Embedding-0.6B) with a cosine retrieval gate of `0.84`. **Only the privileged teacher branch sees the retrieved guide** — the student is regularised toward that teacher with an importance-weighted reverse-KL on the same on-policy tokens (see paper Sec. 5 for the exact loss).
+---
+## 🚀 Quick Start
+The deployed checkpoint is the **student policy** — it consumes a user prompt and returns a JSON `gen_prompt + reference_images` program through a normal `<think>/<tool_call>/<answer>` loop.
+### Option 1 — full GenEvolve runtime (recommended)
+The end-to-end runtime (vLLM/SGLang server + agent loop + tools + Qwen/Nano renderers) lives in the [GitHub repo](https://github.com/Ephemeral182/GenEvolve).
+```bash
+git clone https://github.com/Ephemeral182/GenEvolve.git
+cd GenEvolve
+conda create -n genevolve python=3.11 -y && conda activate genevolve
+pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
+pip install --no-build-isolation -r requirements.txt && pip install -e .
+# serve the policy (TP/DP knobs scale across GPUs)
+MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh
+# end-to-end example
+export SERPER_API_KEY=<your_key>      # required for search / image_search
+export GOOGLE_API_KEY=<your_key>      # only for the Nano Banana Pro backend
+python examples/quickstart.py \
+    --backend nano-banana-pro \
+    --base-url http://localhost:8000/v1 \
+    --model GenEvolve \
+    --prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" in bold serif." \
+    --output paris.png
+```
+### Option 2 — direct Transformers loading
+```python
+from transformers import AutoModelForCausalLM, AutoProcessor
+import torch
+repo = "MeiGen-AI/GenEvolve"
+model = AutoModelForCausalLM.from_pretrained(
+    repo, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
+)
+processor = AutoProcessor.from_pretrained(repo, trust_remote_code=True)
+messages = [
+    {"role": "system", "content": SYSTEM_PROMPT},   # see GitHub repo
+    {"role": "user",   "content": "A vintage diner sign that says 'BLUE SKY DINER' in red neon."},
+]
+prompt_ids = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
+out = model.generate(prompt_ids, max_new_tokens=4096, temperature=0.7, top_p=0.95)
+print(processor.decode(out[0], skip_special_tokens=True))
+```
+### Final-answer JSON
+```json
+{
+  "gen_prompt": "...natural-language prompt that refers to images by 'the first reference image', ...",
+  "reference_images": [
+    {"img_id": "IMG_001", "note": "what to copy from this image"},
+    {"img_id": "IMG_004", "note": "what to copy from this image"}
+  ]
+}
+```
+`gen_prompt` MUST refer to selected images using ordinal phrases (`"the first reference image"`) — never raw `IMG_###` ids or URLs. `reference_images` is sorted by `img_id` ascending so that ordinals resolve unambiguously.
+Pass `(gen_prompt, [r["local_path"] for r in reference_images])` to your favourite reference-conditioned generator (Qwen-Image-Edit, Nano Banana Pro, ...) to obtain the final image.
+---
+## 🗂️ Related Artifacts
+| Artifact | Link |
+|---|---|
+| Project page | https://ephemeral182.github.io/GenEvolve/ |
+| Paper (arXiv) | https://arxiv.org/abs/XXXX.XXXXX |
+| Code | https://github.com/Ephemeral182/GenEvolve |
+| Training data + benchmark | [MeiGen-AI/GenEvolve-Data](https://huggingface.co/datasets/MeiGen-AI/GenEvolve-Data) |
+| Base model | [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
+---
+## 🧾 Training Recipe
+| Stage | Recipe |
+|---|---|
+| **SFT cold start** | LLaMA-Factory, 2 epochs, 16 GPUs, micro-bsz=2, lr=1e-5 (cosine, warmup 0.02), bf16 + FlashAttention-2, ZeRO-3, vision tower frozen. |
+| **Self-evolution** | rLLM/verl, GRPO + experience-conditioned SDL on 8 prompts × 6 rollouts/step, lr=1e-6, ε_ℓ=0.20, ε_h=0.28, 5 epochs over the RL split. |
+| **Reward** | KScore image judge (Faithfulness 0.1 / Visual 0.4 / Text 0.4 / Aesthetics 0.1, Gemini 3.1 Pro Preview) + program-sufficiency text judge, weighted 0.5 / 0.5. |
+| **SDL** | λ_SDL = 2.0, decision-only mask (`<tool_call>`/`<answer>`), top-10% logp-delta filter (`SDL_TOP_K_FRAC=0.1`), IS-cap ρ_max = 2, per-token clip disabled, `seq-mean-token-sum` aggregation. |
+| **Visual experience memory** | 1 bundle / comparison (decision guide); cosine retrieval gate ≥ 0.84; buffer cap 500; Qwen3-Embedding-0.6B keys; teacher-only (no inference-time memory). |
+Full hyper-parameters and ablations are in the appendix tables of the paper.
+---
+## ⚖️ Intended Use, Limits, Bias
+- **Intended use.** Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
+- **Out of scope.** The model produces a *prompt + reference list*, not pixels. Final image quality and safety are inherited from the downstream generator you pair it with. Do not use the agent to fabricate likenesses, infringing logos, or misleading factual imagery — apply your own content-safety filter on the generator side.
+- **Search dependency.** The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
+- **Bias.** Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases. The reward judges (Gemini 3.1 Pro Preview) are themselves models with their own biases, which may shape the post-RL policy.
+---
+## 📑 Citation
+```bibtex
+@inproceedings{chen2026genevolve,
+  title     = {GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation},
+  author    = {Chen, Sixiang and Xing, Zhaohu and Ye, Tian and Geng, Xinyu and Lin, Yunlong
+               and Lai, Jianyu and He, Xuanhua and Zhai, Fuxiang and Gao, Jialin and Zhu, Lei},
+  booktitle = {Submitted to Conference on Neural Information Processing Systems (NeurIPS)},
+  year      = {2026}
+}
+```
+---
+## 🤝 Acknowledgements
+We thank the Qwen, Gemini, FLUX, Z-Image, and BAGEL teams for the underlying generators we evaluate against, and the Skill-SD / Gen-Searcher / KnowGen / WISE authors for the open recipes and benchmarks our work builds on.
+For questions or collaboration, please reach out to [Sixiang Chen](mailto:ephemeral182@gmail.com) or open an issue on the [GitHub repo](https://github.com/Ephemeral182/GenEvolve/issues).

assets/logo_genevolve.png ADDED Viewed

Git LFS Details

SHA256: 937af6d53037398be182085542b6f4efb5adb49a2081f8e887b31e95021875e0
Pointer size: 131 Bytes
Size of remote file: 756 kB

assets/overview.png ADDED Viewed

Git LFS Details

SHA256: 1179b59f6ad60bec0db7fb10b6e1d63757a4348fcdbb63b1f4aa722bcb482e86
Pointer size: 131 Bytes
Size of remote file: 312 kB

assets/teaser.jpg ADDED Viewed

Git LFS Details

SHA256: 8dbaa1a01e86af20c5d56db60ca4e438302bfd919e45aac66e14d586a316b1dc
Pointer size: 132 Bytes
Size of remote file: 2.04 MB

assets/visual_comparison.png ADDED Viewed

Git LFS Details

SHA256: 953413f9e05cb90e5ff6b616896e0a6f8e2fbb4cb03d3c288f5c72af336da4db
Pointer size: 132 Bytes
Size of remote file: 9.86 MB