The code in github is just fake code????
Hi, that's an excellent question, and thank you for your close attention to our work!
You're right to point out that the current public interfaces for understanding and generation are separate. This was a deliberate choice for two primary reasons:
- Clear Evaluation: It allows the community to independently verify the model's performance on both tasks, which is a standard practice for benchmarking.
- Inference Pipelines: The two tasks currently have slightly different preprocessing needs during inference (e.g., mixed-resolution and classifier-free guidance for generation).
The key thing to emphasize is that this separation is only at the interface level, not within the model's core architecture.
That said, unified interface for generation and understanding is crucial to natively unify visual understanding and generation within a single autoregressive framework. We are actively working on it and the corresponding codes will be released in the coming days.
We’d love to have you involved in shaping the project—feel free to open issues, suggest features, or submit PRs so we can build this together!
Best regards,
Ming team
Hi, thank you again for your feedback!
We’re excited to share that we’ve now released the unified interface for image understanding, generation, and editing! This update allows seamless multimodal interactions within a single autoregressive framework, supporting flexible input types ("text" and "image"), mixed input orders, and multi-turn conversations via internal state management.
Key features:
- Image generation: Use descriptive prompts with
output_image_prefixto save generated images. - Image understanding: Include both "image" and "text" in the same message for joint reasoning.
- Image editing: Chain multiple
generate(..., for_edit=True)calls with uniqueoutput_image_prefixnames. - Multi-turn interactions: Supported via the model’s internal state — call
model.reset_inner_state()to reset when needed.
You can find detailed usage examples in the updated README. We’d love for you to try it out and let us know what you think!
As always, we welcome your contributions — feel free to open issues, suggest improvements, or submit PRs. Let’s build this together!
Best regards,
Ming team




