Instructions to use HiDream-ai/HiDream-O1-Image-Dev-2604 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HiDream-ai/HiDream-O1-Image-Dev-2604 with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("HiDream-ai/HiDream-O1-Image-Dev-2604") model = AutoModelForImageTextToText.from_pretrained("HiDream-ai/HiDream-O1-Image-Dev-2604") - Notebooks
- Google Colab
- Kaggle
Are you giving settings and sharing advice to the community to make sure your models are shown in the best light?
I'm trying hard not to force my own opinions onto other people and hold my judgment with HiDream-O1-Image, but the image results and quality are bad compared to current and past models. The Devs need to have a better connection with the community to make sure it's implemented correctly and runs well to show the model in its best light possible so no one runs this model wrongly.
Thanks for your interests on our models, and appreciate your suggestions. We are currently busy on releasing the series of HiDream-O1-Image models (including Full, Dev, and Dev-2604 versions) in these days. We also notice the feedback from community, and continuously to improve the models and inference pipelines accordingly.
To be clear, HiDream-O1-Image and HiDream-O1-Image-Dev supports various tasks (e.g., text-to-image, image editing, and subject-driven personalization). In contrast,
HiDream-O1-Image-Dev-2604 (the whole pipeline with both prompt refiner and t2i model) here is tailored for text-to-image generation task with higher image quality.
Thanks again!
Speaking on this subject, Iโve been trying to get per-step sigma modulation working with the HiDream-01 dev model. Since the model relies on a vendored pipeline.py with custom flow-matching schedulers rather than ComfyUI's native KSampler, standard hooks completely miss it. We've tried directly modifying the denoising loop and monkey-patching the sigma schedule, but it consistently causes stability issues and tensor blowouts because it breaks the flow-matching shift math. Is it possible to natively implement support for sigma modulation directly within your custom denoising loop, or is there a recommended, safe way to hook into the pipeline for this?
Thanks. Our code base is mainly based on diffusers and I am not very familiar with ComfyUI's scheduler.
For Hidream-O1-Image, use FlowUniPCMultistepScheduler or FlowMatchEulerDiscreteScheduler is both ok and we find that FlowUniPCMultistepScheduler is slightly better.
For HiDream-O1-Image-Dev, we use FlashFlowMatchEulerDiscreteScheduler for text-to-image, which is basically the LCM for flow matching. The only change is that when adding noise, we need to multiply the noise by a noise scale, since during training the noise scale is 8.0 rather than a standard gaussian. Currently the noise_scale_start and noise_scale_end are a little bit complex and keeping a constant noise scale like 7.5 or 8.0 in inference is fine. The noise_clip_std is to clip extreme value in added noise since the model is in pixel space. To be short, I think it is ok to just add constant noise scale in LCM and clip extreme value of noise.
For HiDream-O1-Image-Dev editing task, we use the default FlowMatchEulerDiscreteScheduler.
For HiDream-O1-Image-Dev-2509, we use the FlashFlowMatchEulerDiscreteScheduler with constant noise scale 8.0 and no noise clipping in practice.
Is there a specific code or repo I can refer to so that I can improve the code for better supporting?
@realrebelai there is some discussion in the Comfy-Org page here:
https://huggingface.co/Comfy-Org/HiDream-O1-Image/discussions/5
They note that you want to use ComfyUI nightly, as it isn't in the release yet.
@cai-qi You may also want to check that thread and see if there is anything HiDream can do to help support it. ComfyUI is one of the most widely used tools, so first class support for it would go a long ways to spreading your models.
Thanks for releasing your model under a real open source license too, by the way. :)