Instructions to use HiDream-ai/HiDream-O1-Image with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HiDream-ai/HiDream-O1-Image with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("HiDream-ai/HiDream-O1-Image") model = AutoModelForImageTextToText.from_pretrained("HiDream-ai/HiDream-O1-Image") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -8,6 +8,14 @@ library_name: transformers
|
|
| 8 |
|
| 9 |
HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) without external VAEs or disjoint text encoders, which natively encodes raw pixels, text, and task-specific conditions in a single shared token space — supporting text-to-image, image editing, and subject-driven personalization at up to 2,048 × 2,048.
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
<p align="center">
|
| 12 |
<img src="assets/general.webp" alt="General text-to-image generation" width="100%"/>
|
| 13 |
<br><sub><b>General text-to-image generation</b> at up to 2,048 × 2,048.</sub>
|
|
|
|
| 8 |
|
| 9 |
HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) without external VAEs or disjoint text encoders, which natively encodes raw pixels, text, and task-specific conditions in a single shared token space — supporting text-to-image, image editing, and subject-driven personalization at up to 2,048 × 2,048.
|
| 10 |
|
| 11 |
+
|
| 12 |
+
> **HiDream-O1-Image (codename: Peanut) debuts at #8 in the Artificial Analysis Text to Image Arena, which is positioned to be the new leading open weights Text to Image model (2026-5-5).**
|
| 13 |
+
|
| 14 |
+
<p align="center">
|
| 15 |
+
<img src="assets/leaderboard.png" alt="Artificial Analysis Text to Image Arena" width="100%"/>
|
| 16 |
+
<br><sub><b>Artificial Analysis Text to Image Arena</b> at up to 2,048 × 2,048.</sub>
|
| 17 |
+
</p>
|
| 18 |
+
|
| 19 |
<p align="center">
|
| 20 |
<img src="assets/general.webp" alt="General text-to-image generation" width="100%"/>
|
| 21 |
<br><sub><b>General text-to-image generation</b> at up to 2,048 × 2,048.</sub>
|