can you please provide an example?
Could you please provide a running example code with diffusers to use this model? Thank you so much.
sure, I'm doing a diffusers recipe repo here but I still haven't had the time to update it with this model, so here's a code example in the meantime:
import torch
from diffusers import QwenImagePipeline, QwenImageTransformer2DModel
from transformers import Qwen2_5_VLForConditionalGeneration
torch_dtype = torch.bfloat16
transformer = QwenImageTransformer2DModel.from_pretrained(
"OzzyGT/Qwen-Image-2512-bnb-4bit-transformer", torch_dtype=torch_dtype, device_map="cpu"
)
text_encoder = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"OzzyGT/Qwen-Image-2512-bnb-4bit-text-encoder", torch_dtype=torch_dtype, device_map="cpu"
)
pipe = QwenImagePipeline.from_pretrained(
"Qwen/Qwen-Image-2512", transformer=transformer, text_encoder=text_encoder, torch_dtype=torch_dtype
)
pipe.enable_model_cpu_offload()
prompt = """A photograph that captures a young woman on a city rooftop, with a hazy city skyline in the background. She has long, dark hair that naturally drapes over her shoulders and is wearing a simple tank top. Her posture is relaxed, with her hands resting on the railing in front of her, leaning slightly forward as she looks directly into the camera. The sunlight, coming from behind her at an angle, creates a soft backlight effect that casts a warm golden halo around the edges of her hair and shoulders. This light also produces a slight lens flare, adding a dreamy quality to the image. The city buildings in the background are blurred by the backlight, emphasizing the main subject. The overall tone is warm, evoking a sense of tranquility and a hint of melancholy."""
negative_prompt = "低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=1664,
height=928,
num_inference_steps=28,
true_cfg_scale=4.0,
generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
image.save("qwen-image_output.png")
Does it really work in CPU mode?
you're assuming something wrong, that's the loading, not the inference. Using pipe.enable_model_cpu_offload() makes it move the model to the GPU when is needed at inference.
No what I meant is bitsandbyte will need GPU so the whole code base will not run in CPU (without CUDA) . All good
not sure I understand your issue, but glad its all good.
Just in case, bnb loads the models to the GPU but here I'm assuming that if you're using quantization it is because you don't have enough VRAM, so the way to overcome this is to load one model (which will load to GPU) and move it to the "cpu" to free the VRAM, and then load the other in the same way. After you do this, the cpu offloading works and it's in charge of moving the respective models to "cpu" and "gpu" when needed.
Thank you so much!!
OzzyGT/Qwen-Image-2512-bnb-4bit-transformer how did you quant that ? I dont see AItoolkit watermark , is it just using bnb to get the quant for any specific framework used ?
I used diffusers and bnb, no additional framework