Instructions to use HiDream-ai/HiDream-O1-Image-Dev with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HiDream-ai/HiDream-O1-Image-Dev with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("HiDream-ai/HiDream-O1-Image-Dev") model = AutoModelForImageTextToText.from_pretrained("HiDream-ai/HiDream-O1-Image-Dev") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -3,19 +3,22 @@ license: mit
|
|
| 3 |
pipeline_tag: image-text-to-image
|
| 4 |
library_name: transformers
|
| 5 |
---
|
| 6 |
-
|
| 7 |
# HiDream-O1-Image
|
| 8 |
|
| 9 |
HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) without external VAEs or disjoint text encoders, which natively encodes raw pixels, text, and task-specific conditions in a single shared token space β supporting text-to-image, image editing, and subject-driven personalization at up to 2,048 Γ 2,048.
|
| 10 |
|
| 11 |
## Project Updates
|
| 12 |
-
-
|
|
|
|
| 13 |
- π€ **May 10, 2026:** Try **HiDream-O1-Image** online on Hugging Face Spaces β [π€ HiDream-O1-Image](https://huggingface.co/spaces/HiDream-ai/HiDream-O1-Image) and [π€ HiDream-O1-Image-Dev](https://huggingface.co/spaces/HiDream-ai/HiDream-O1-Image-Dev).
|
| 14 |
-
- π **May 10, 2026:** Our **technical report** is now available β [π HiDream-O1-Image.pdf](
|
| 15 |
- π **May 8, 2026:** We've open-sourced **HiDream-O1-Image (8B)**, including both the undistilled and distilled Dev variants, together with the Reasoning-Driven Prompt Agent.
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
| 18 |
|
|
|
|
| 19 |
<p align="center">
|
| 20 |
<img src="assets/leaderboard.png" alt="Artificial Analysis Text to Image Arena" width="100%"/>
|
| 21 |
<br><sub><b>Artificial Analysis Text to Image Arena</b> at up to 2,048 Γ 2,048.</sub>
|
|
@@ -36,7 +39,6 @@ HiDream-O1-Image is a natively unified image generative foundation model built o
|
|
| 36 |
<br><sub><b>Subject-driven personalization</b> β preserve identity / IP across new scenes.</sub>
|
| 37 |
</p>
|
| 38 |
|
| 39 |
-
|
| 40 |
## Key Features
|
| 41 |
|
| 42 |
- 𧬠**Pixel-Level Unified Transformer** β One end-to-end model on raw pixels, no VAE, no disjoint text encoder.
|
|
@@ -45,14 +47,18 @@ HiDream-O1-Image is a natively unified image generative foundation model built o
|
|
| 45 |
- πΌοΈ **Native High Resolution** β Direct synthesis up to 2,048 Γ 2,048 with sharp fine-grained detail.
|
| 46 |
- β‘ **Exceptional Efficiency and Versatility at 8B Scale** β With only 8B parameters, achieves performance parity with or even surpasses larger open-source DiTs and leading closed-source models.
|
| 47 |
|
|
|
|
|
|
|
| 48 |
## Models
|
| 49 |
|
| 50 |
| Name | Script | Inference Steps | HuggingFace Repo |
|
| 51 |
| :--- | :--- | :---: | :--- |
|
| 52 |
-
| HiDream-O1-Image | `inference.py` | 50 | [π€ HiDream-O1-Image](https://huggingface.co/HiDream-ai/HiDream-O1-Image) |
|
| 53 |
-
| HiDream-O1-Image-Dev | `inference.py` | 28 | [π€ HiDream-O1-Image-Dev](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev) |
|
| 54 |
-
| Prompt Agent | `prompt_agent.py` | β | [π€ google/gemma-4-31B-it](https://huggingface.co/google/gemma-4-31B-it) |
|
| 55 |
-
| Web Demo | `app.py` | β | β |
|
|
|
|
|
|
|
| 56 |
|
| 57 |
## Evaluation
|
| 58 |
|
|
@@ -189,7 +195,7 @@ cd HiDream-O1-Image
|
|
| 189 |
pip install -r requirements.txt
|
| 190 |
```
|
| 191 |
|
| 192 |
-
> **Note on `flash-attn`.** We highly recommend installing [`flash-attn`](https://github.com/Dao-AILab/flash-attention) for optimized attention computation. **If you do not (or cannot) install `flash-attn`, you must edit `models/pipeline.py` line
|
| 193 |
|
| 194 |
## Reasoning-Driven Prompt Agent
|
| 195 |
|
|
@@ -299,6 +305,8 @@ python inference.py \
|
|
| 299 |
--model_type dev
|
| 300 |
```
|
| 301 |
|
|
|
|
|
|
|
| 302 |
### Command Line Arguments
|
| 303 |
|
| 304 |
- `--model_path`: Path to the complete HuggingFace model directory (undistilled or distilled).
|
|
@@ -308,16 +316,17 @@ python inference.py \
|
|
| 308 |
- `--height` / `--width`: Output image dimensions (default: `2048` Γ `2048`; values snap to valid resolutions internally).
|
| 309 |
- `--model_type`: `full` or `dev` (default: `full`). Selects the inference recipe:
|
| 310 |
- `full`: 50 steps, guidance scale `5.0`, shift `3.0`, default scheduler.
|
| 311 |
-
- `dev`: 28 steps, guidance scale `0.0`, shift `1.0`, flash scheduler with predefined timesteps.
|
| 312 |
- `--seed`: Random seed (default: `32`).
|
| 313 |
- `--guidance_scale`: Guidance scale (default: `5.0`). Only effective when `--model_type` is `full`.
|
| 314 |
-
- `--noise_scale_start`, `--noise_scale_end`: Control the scale of the noise injected by the scheduler at each denoising step; the per-step scale linearly interpolates from `noise_scale_start` (first step) to `noise_scale_end` (last step). See `models/pipeline.py:
|
| 315 |
-
- `--noise_clip_std`: Per-step clipping threshold (in units of the injected noise's standard deviation) applied to the noise added during scheduler stepping. See `models/flash_scheduler.py:
|
|
|
|
| 316 |
- `--keep_original_aspect`: When exactly one reference image is provided, resize it with `max_size=2048` and use its dimensions for the target image (preserves the reference's aspect ratio) if `True`.
|
| 317 |
|
| 318 |
## Web Demo
|
| 319 |
|
| 320 |
-
`app.py` is a
|
| 321 |
|
| 322 |
### Starting the server
|
| 323 |
|
|
@@ -339,11 +348,24 @@ Then open `http://localhost:7860` in your browser.
|
|
| 339 |
| `--host` | `0.0.0.0` | Bind address for the Flask server. |
|
| 340 |
| `--port` | `7860` | Port for the Flask server. |
|
| 341 |
|
| 342 |
-
All four arguments can also be set via environment variables (see `.env.example`): `HIDREAM_MODEL_PATH`, `HIDREAM_MODEL_TYPE`, `HIDREAM_HOST`, and `HIDREAM_PORT`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 343 |
|
| 344 |
### Prompt Agent in the UI
|
| 345 |
|
| 346 |
-
The sidebar contains a Prompt Agent panel that calls the same Reasoning-Driven Prompt Agent used by `prompt_agent.py`.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 347 |
|
| 348 |
## License
|
| 349 |
The code in this repository and the HiDream-O1-Image models are licensed under MIT License.
|
|
|
|
| 3 |
pipeline_tag: image-text-to-image
|
| 4 |
library_name: transformers
|
| 5 |
---
|
|
|
|
| 6 |
# HiDream-O1-Image
|
| 7 |
|
| 8 |
HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) without external VAEs or disjoint text encoders, which natively encodes raw pixels, text, and task-specific conditions in a single shared token space β supporting text-to-image, image editing, and subject-driven personalization at up to 2,048 Γ 2,048.
|
| 9 |
|
| 10 |
## Project Updates
|
| 11 |
+
- π **May 14, 2026:** We open-sourced [**HiDream-O1-Image-Dev-2604**](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev-2604) with its [prompt refiner](https://huggingface.co/HiDream-ai/Prompt-Refine), tailored for text-to-image generation task.
|
| 12 |
+
- π οΈ **May 13, 2026:** Inference & pipeline updates β accelerated IP inference; the IP pipeline now supports **layout** and **skeleton** conditioning; updated the Dev editing scheduler. For editing tasks we recommend using the **full** model. PyTorch 2.9.x is not recommended due to the [issue](https://github.com/QwenLM/Qwen3-VL/issues/1811).
|
| 13 |
- π€ **May 10, 2026:** Try **HiDream-O1-Image** online on Hugging Face Spaces β [π€ HiDream-O1-Image](https://huggingface.co/spaces/HiDream-ai/HiDream-O1-Image) and [π€ HiDream-O1-Image-Dev](https://huggingface.co/spaces/HiDream-ai/HiDream-O1-Image-Dev).
|
| 14 |
+
- π **May 10, 2026:** Our **technical report** is now available β [π HiDream-O1-Image.pdf](assets/HiDream-O1-Image.pdf).
|
| 15 |
- π **May 8, 2026:** We've open-sourced **HiDream-O1-Image (8B)**, including both the undistilled and distilled Dev variants, together with the Reasoning-Driven Prompt Agent.
|
| 16 |
|
| 17 |
+
<div align="center">
|
| 18 |
+
<video src="https://github.com/user-attachments/assets/cbbdb816-f050-4685-aa51-4741479a0e5c" width="70%" poster=""> </video>
|
| 19 |
+
</div>
|
| 20 |
|
| 21 |
+
> **HiDream-O1-Image-Dev-2604 debuts at #8 in the Artificial Analysis Text to Image Arena, which is positioned to be the new leading open weights Text to Image model.**
|
| 22 |
<p align="center">
|
| 23 |
<img src="assets/leaderboard.png" alt="Artificial Analysis Text to Image Arena" width="100%"/>
|
| 24 |
<br><sub><b>Artificial Analysis Text to Image Arena</b> at up to 2,048 Γ 2,048.</sub>
|
|
|
|
| 39 |
<br><sub><b>Subject-driven personalization</b> β preserve identity / IP across new scenes.</sub>
|
| 40 |
</p>
|
| 41 |
|
|
|
|
| 42 |
## Key Features
|
| 43 |
|
| 44 |
- 𧬠**Pixel-Level Unified Transformer** β One end-to-end model on raw pixels, no VAE, no disjoint text encoder.
|
|
|
|
| 47 |
- πΌοΈ **Native High Resolution** β Direct synthesis up to 2,048 Γ 2,048 with sharp fine-grained detail.
|
| 48 |
- β‘ **Exceptional Efficiency and Versatility at 8B Scale** β With only 8B parameters, achieves performance parity with or even surpasses larger open-source DiTs and leading closed-source models.
|
| 49 |
|
| 50 |
+
|
| 51 |
+
|
| 52 |
## Models
|
| 53 |
|
| 54 |
| Name | Script | Inference Steps | HuggingFace Repo |
|
| 55 |
| :--- | :--- | :---: | :--- |
|
| 56 |
+
| HiDream-O1-Image | [`inference.py`](./inference.py) | 50 | [π€ HiDream-O1-Image](https://huggingface.co/HiDream-ai/HiDream-O1-Image) |
|
| 57 |
+
| HiDream-O1-Image-Dev | [`inference.py`](./inference.py) | 28 | [π€ HiDream-O1-Image-Dev](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev) |
|
| 58 |
+
| Prompt Agent | [`prompt_agent.py`](./prompt_agent.py) | β | [π€ google/gemma-4-31B-it](https://huggingface.co/google/gemma-4-31B-it) |
|
| 59 |
+
| Web Demo | [`app.py`](./app.py) | β | β |
|
| 60 |
+
| HiDream-O1-Image-Dev-2604 | [`inference.py` (dev branch)](https://github.com/HiDream-ai/HiDream-O1-Image/blob/dev/inference.py) | 28 | [π€ HiDream-O1-Image-Dev-2604](https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev-2604) |
|
| 61 |
+
| Prompt Agent 2604 | [`prompt_agent_v2.py` (dev branch)](https://github.com/HiDream-ai/HiDream-O1-Image/blob/dev/prompt_agent_v2.py) | β | [π€ HiDream-ai/Prompt-Refine](https://huggingface.co/HiDream-ai/Prompt-Refine) |
|
| 62 |
|
| 63 |
## Evaluation
|
| 64 |
|
|
|
|
| 195 |
pip install -r requirements.txt
|
| 196 |
```
|
| 197 |
|
| 198 |
+
> **Note on `flash-attn`.** We highly recommend installing [`flash-attn`](https://github.com/Dao-AILab/flash-attention) for optimized attention computation. **If you do not (or cannot) install `flash-attn`, you must edit `models/pipeline.py` line 341 and change `"use_flash_attn": True` to `"use_flash_attn": False`** β otherwise inference will fail to import the kernel.
|
| 199 |
|
| 200 |
## Reasoning-Driven Prompt Agent
|
| 201 |
|
|
|
|
| 305 |
--model_type dev
|
| 306 |
```
|
| 307 |
|
| 308 |
+
For **editing** tasks (exactly one reference image), the Dev model defaults to the `flow_match` scheduler. `flow_match` is recommended for editing tasks. Pass `--editing_scheduler flash` to use the flash scheduler instead. This flag has no effect on the `full` model or on non-editing tasks.
|
| 309 |
+
|
| 310 |
### Command Line Arguments
|
| 311 |
|
| 312 |
- `--model_path`: Path to the complete HuggingFace model directory (undistilled or distilled).
|
|
|
|
| 316 |
- `--height` / `--width`: Output image dimensions (default: `2048` Γ `2048`; values snap to valid resolutions internally).
|
| 317 |
- `--model_type`: `full` or `dev` (default: `full`). Selects the inference recipe:
|
| 318 |
- `full`: 50 steps, guidance scale `5.0`, shift `3.0`, default scheduler.
|
| 319 |
+
- `dev`: 28 steps, guidance scale `0.0`, shift `1.0`, flash scheduler with predefined timesteps. For editing tasks (exactly one reference image), the default scheduler is `flow_match` instead β see `--editing_scheduler`.
|
| 320 |
- `--seed`: Random seed (default: `32`).
|
| 321 |
- `--guidance_scale`: Guidance scale (default: `5.0`). Only effective when `--model_type` is `full`.
|
| 322 |
+
- `--noise_scale_start`, `--noise_scale_end`: Control the scale of the noise injected by the scheduler at each denoising step; the per-step scale linearly interpolates from `noise_scale_start` (first step) to `noise_scale_end` (last step). See `models/pipeline.py:313` (initial noise) and `models/pipeline.py:323-326` (per-step linear interpolation). Defaults: `7.5`, `7.5`.
|
| 323 |
+
- `--noise_clip_std`: Per-step clipping threshold (in units of the injected noise's standard deviation) applied to the noise added during scheduler stepping. See `models/flash_scheduler.py:350-354`. Default: `2.5`.
|
| 324 |
+
- `--editing_scheduler`: Scheduler to use for editing tasks (exactly one reference image) when `--model_type dev`. Choices: `flow_match` (default) or `flash`. Ignored for the `full` model and for non-editing tasks.
|
| 325 |
- `--keep_original_aspect`: When exactly one reference image is provided, resize it with `max_size=2048` and use its dimensions for the target image (preserves the reference's aspect ratio) if `True`.
|
| 326 |
|
| 327 |
## Web Demo
|
| 328 |
|
| 329 |
+
`app.py` is a single-file Flask web UI (with HTML / CSS / JS embedded inline) that exposes all generation modes. It also integrates the Reasoning-Driven Prompt Agent.
|
| 330 |
|
| 331 |
### Starting the server
|
| 332 |
|
|
|
|
| 348 |
| `--host` | `0.0.0.0` | Bind address for the Flask server. |
|
| 349 |
| `--port` | `7860` | Port for the Flask server. |
|
| 350 |
|
| 351 |
+
All four CLI arguments above can also be set via environment variables (see `.env.example`): `HIDREAM_MODEL_PATH`, `HIDREAM_MODEL_TYPE`, `HIDREAM_HOST`, and `HIDREAM_PORT`.
|
| 352 |
+
|
| 353 |
+
The Prompt Agent panel in the Web Demo reads additional environment variables from `.env`:
|
| 354 |
+
|
| 355 |
+
| Env Var | Used by | Description |
|
| 356 |
+
| :--- | :--- | :--- |
|
| 357 |
+
| `HIDREAM_AGENT_MODEL` | Local Β· Gemma backend | Path or HF repo id of the local Gemma weights. |
|
| 358 |
+
| `OPENAI_BASE_URL` | OpenAI-compatible API backend | Default base URL pre-filled in the UI. |
|
| 359 |
+
| `OPENAI_API_KEY` | OpenAI-compatible API backend | Default API key pre-filled in the UI. |
|
| 360 |
+
| `OPENAI_MODEL` | OpenAI-compatible API backend | Default model name pre-filled in the UI. |
|
| 361 |
|
| 362 |
### Prompt Agent in the UI
|
| 363 |
|
| 364 |
+
The sidebar contains a Prompt Agent panel that calls the same Reasoning-Driven Prompt Agent used by `prompt_agent.py`. Select either the *OpenAI-compatible API* backend (any endpoint, key, and model name) or the *Local Β· Gemma* backend (set `HIDREAM_AGENT_MODEL` in `.env` or the environment to point to your local Gemma-4-31B-it weights).
|
| 365 |
+
|
| 366 |
+
### Editing Scheduler (Dev model only)
|
| 367 |
+
|
| 368 |
+
When the server is launched with `--model_type dev`, the **Edit** tab exposes a *Scheduler* dropdown with two options: `flow_match` (default) and `flash`. The selector is hidden for the `full` model and for the Text β Image / Subject tabs, where the scheduler is fixed.
|
| 369 |
|
| 370 |
## License
|
| 371 |
The code in this repository and the HiDream-O1-Image models are licensed under MIT License.
|