Image-Text-to-Text
Transformers
Safetensors
English
qwen3_vl
agent
image-generation
tool-use
visual-reasoning
self-distillation
grpo
reinforcement-learning
multimodal
qwen3-vl
conversational
Instructions to use MeiGen-AI/GenEvolve with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MeiGen-AI/GenEvolve with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="MeiGen-AI/GenEvolve") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("MeiGen-AI/GenEvolve") model = AutoModelForImageTextToText.from_pretrained("MeiGen-AI/GenEvolve") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MeiGen-AI/GenEvolve with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MeiGen-AI/GenEvolve" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MeiGen-AI/GenEvolve", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/MeiGen-AI/GenEvolve
- SGLang
How to use MeiGen-AI/GenEvolve with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MeiGen-AI/GenEvolve" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MeiGen-AI/GenEvolve", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MeiGen-AI/GenEvolve" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MeiGen-AI/GenEvolve", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use MeiGen-AI/GenEvolve with Docker Model Runner:
docker model run hf.co/MeiGen-AI/GenEvolve
README: drop the 3-tool table and the Out-of-scope bullet
Browse files
README.md
CHANGED
|
@@ -95,12 +95,6 @@ This repository hosts the **GenEvolve agent policy** — a Qwen3-VL-8B-Instruct
|
|
| 95 |
|
| 96 |
For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image.
|
| 97 |
|
| 98 |
-
| Tool | Role | Output |
|
| 99 |
-
|---|---|---|
|
| 100 |
-
| `search(queries)` | External textual evidence — entities, dates, facts | Markdown digest |
|
| 101 |
-
| `image_search(query)` | Visual references; each result is given a unique `IMG_###` id | Image list with local paths |
|
| 102 |
-
| `query_knowledge(skill_name)` | Internal generation knowledge — `spatial_layout`, `text_rendering`, `quantity_counting`, `attribute_binding`, `anatomy_body_coherence`, `physical_material_consistency`, `creative_drawing`, `aesthetic_drawing` | Skill markdown |
|
| 103 |
-
|
| 104 |
---
|
| 105 |
|
| 106 |
## 🖼️ Visual Demos
|
|
@@ -178,7 +172,6 @@ The agent's final `<answer>` is a JSON object:
|
|
| 178 |
## ⚖️ Intended Use, Limits, Bias
|
| 179 |
|
| 180 |
- **Intended use.** Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
|
| 181 |
-
- **Out of scope.** The model produces a *prompt + reference list*, not pixels. Final image quality and safety are inherited from the downstream generator you pair it with. Do not use the agent to fabricate likenesses, infringing logos, or misleading factual imagery — apply your own content-safety filter on the generator side.
|
| 182 |
- **Search dependency.** The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
|
| 183 |
- **Bias.** Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs.
|
| 184 |
|
|
|
|
| 95 |
|
| 96 |
For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image.
|
| 97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
---
|
| 99 |
|
| 100 |
## 🖼️ Visual Demos
|
|
|
|
| 172 |
## ⚖️ Intended Use, Limits, Bias
|
| 173 |
|
| 174 |
- **Intended use.** Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
|
|
|
|
| 175 |
- **Search dependency.** The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
|
| 176 |
- **Bias.** Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs.
|
| 177 |
|