Hi!

#1
by gabbo1995 - opened

Hello, thanks for the model. Do you know if there is any way to run this with diffusers? Something like the following code, maybe? Seems similar in logic.

import torch
from diffusers import QwenImagePipeline, QwenImageTransformer2DModel
from transformers import Qwen2_5_VLForConditionalGeneration

torch_dtype = torch.bfloat16

transformer = QwenImageTransformer2DModel.from_pretrained(
"OzzyGT/Qwen-Image-2512-bnb-4bit-transformer", torch_dtype=torch_dtype, device_map="cpu"
)
text_encoder = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"OzzyGT/Qwen-Image-2512-bnb-4bit-text-encoder", torch_dtype=torch_dtype, device_map="cpu"
)

pipe = QwenImagePipeline.from_pretrained(
"Qwen/Qwen-Image-2512", transformer=transformer, text_encoder=text_encoder, torch_dtype=torch_dtype
)
pipe.enable_model_cpu_offload()

prompt = """A photograph that captures a young woman on a city rooftop, with a hazy city skyline in the background. She has long, dark hair that naturally drapes over her shoulders and is wearing a simple tank top. Her posture is relaxed, with her hands resting on the railing in front of her, leaning slightly forward as she looks directly into the camera. The sunlight, coming from behind her at an angle, creates a soft backlight effect that casts a warm golden halo around the edges of her hair and shoulders. This light also produces a slight lens flare, adding a dreamy quality to the image. The city buildings in the background are blurred by the backlight, emphasizing the main subject. The overall tone is warm, evoking a sense of tranquility and a hint of melancholy."""
negative_prompt = "低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。"

image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=1664,
height=928,
num_inference_steps=28,
true_cfg_scale=4.0,
generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]

image.save("qwen-image_output.png")

This won't work. ComfyUI's diffusion models have different module names compared to standard Diffusers models, and Diffusers models require a corresponding compression description config.json if using BNB compression.
In fact, generally, if you want a BNB-compressed Diffuser model, you can simply load the original model and then use the corresponding interface in the BNB library to quantize and save it. It's very simple and you can do it yourself easily.
These ComfyUI models cannot be loaded with standard Diffusers, which is why a special quantization method is needed, and why sharing them is meaningful.

Thank you!

Sign up or log in to comment