This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
Pipeline Type: StableDiffusionXLAutoBlocks
Description: Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks using Stable Diffusion XL.
This pipeline uses a 5-block architecture that can be customized and extended.
Example Usage
[TODO]
Pipeline Architecture
This modular pipeline is composed of the following blocks:
- text_encoder (
StableDiffusionXLTextEncoderStep)- Text Encoder step that generate text_embeddings to guide the image generation
- ip_adapter (
StableDiffusionXLAutoIPAdapterStep)- Run IP Adapter step if
ip_adapter_imageis provided. This step should be placed before the 'input' step.
- Run IP Adapter step if
- vae_encoder (
StableDiffusionXLAutoVaeEncoderStep)- Vae encoder step that encode the image inputs into their latent representations.
- denoise (
StableDiffusionXLCoreDenoiseStep)- Core step that performs the denoising process.
- decode (
StableDiffusionXLAutoDecodeStep)- Decode step that decode the denoised latents into images outputs.
Model Components
- text_encoder (
CLIPTextModel) - text_encoder_2 (
CLIPTextModelWithProjection) - tokenizer (
CLIPTokenizer) - tokenizer_2 (
CLIPTokenizer) - guider (
ClassifierFreeGuidance) - image_encoder (
CLIPVisionModelWithProjection) - feature_extractor (
CLIPImageProcessor) - unet (
UNet2DConditionModel) - vae (
AutoencoderKL) - image_processor (
VaeImageProcessor) - mask_processor (
VaeImageProcessor) - scheduler (
EulerDiscreteScheduler) - controlnet (
ControlNetUnionModel) - control_image_processor (
VaeImageProcessor)
Configuration Parameters
force_zeros_for_empty_prompt (default: True) requires_aesthetics_score (default: False)
Workflow Input Specification
text2image
prompt(None, optional): No description provided
image2image
prompt(None, optional): No description providedimage(None): No description provided
inpainting
prompt(None, optional): No description providedimage(None): No description providedmask_image(None): No description provided
controlnet_text2image
prompt(None, optional): No description providedcontrol_image(None): No description provided
controlnet_image2image
prompt(None, optional): No description providedimage(None): No description providedcontrol_image(None): No description provided
controlnet_inpainting
prompt(None, optional): No description providedimage(None): No description providedmask_image(None): No description providedcontrol_image(None): No description provided
controlnet_union_text2image
prompt(None, optional): No description providedcontrol_image(None): No description providedcontrol_mode(None): No description provided
controlnet_union_image2image
prompt(None, optional): No description providedimage(None): No description providedcontrol_image(None): No description providedcontrol_mode(None): No description provided
controlnet_union_inpainting
prompt(None, optional): No description providedimage(None): No description providedmask_image(None): No description providedcontrol_image(None): No description providedcontrol_mode(None): No description provided
ip_adapter_text2image
prompt(None, optional): No description providedip_adapter_image(Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapter
ip_adapter_image2image
prompt(None, optional): No description providedip_adapter_image(Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapterimage(None): No description provided
ip_adapter_inpainting
prompt(None, optional): No description providedip_adapter_image(Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapterimage(None): No description providedmask_image(None): No description provided
ip_adapter_controlnet_text2image
prompt(None, optional): No description providedip_adapter_image(Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adaptercontrol_image(None): No description provided
ip_adapter_controlnet_image2image
prompt(None, optional): No description providedip_adapter_image(Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapterimage(None): No description providedcontrol_image(None): No description provided
ip_adapter_controlnet_inpainting
prompt(None, optional): No description providedip_adapter_image(Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapterimage(None): No description providedmask_image(None): No description providedcontrol_image(None): No description provided
ip_adapter_controlnet_union_text2image
prompt(None, optional): No description providedip_adapter_image(Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adaptercontrol_image(None): No description providedcontrol_mode(None): No description provided
ip_adapter_controlnet_union_image2image
prompt(None, optional): No description providedip_adapter_image(Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapterimage(None): No description providedcontrol_image(None): No description providedcontrol_mode(None): No description provided
ip_adapter_controlnet_union_inpainting
prompt(None, optional): No description providedip_adapter_image(Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapterimage(None): No description providedmask_image(None): No description providedcontrol_image(None): No description providedcontrol_mode(None): No description provided
Input/Output Specification
Inputs:
prompt(None, optional): No description providedprompt_2(None, optional): No description providednegative_prompt(None, optional): No description providednegative_prompt_2(None, optional): No description providedcross_attention_kwargs(None, optional): No description providedclip_skip(None, optional): No description providedip_adapter_image(Image | ndarray | Tensor | list | list | list, optional): The image(s) to be used as ip adapterheight(None, optional): No description providedwidth(None, optional): No description providedimage(None, optional): No description providedmask_image(None, optional): No description providedpadding_mask_crop(None, optional): No description provideddtype(dtype, optional): The dtype of the model inputsgenerator(None, optional): No description providedpreprocess_kwargs(dict | NoneType, optional): A kwargs dictionary that if specified is passed along to theImageProcessoras defined underself.image_processorin [diffusers.image_processor.VaeImageProcessor]num_images_per_prompt(None, optional, defaults to1): No description providedip_adapter_embeds(list, optional): Pre-generated image embeddings for IP-Adapter. Can be generated from ip_adapter step.negative_ip_adapter_embeds(list, optional): Pre-generated negative image embeddings for IP-Adapter. Can be generated from ip_adapter step.num_inference_steps(None, optional, defaults to50): No description providedtimesteps(None, optional): No description providedsigmas(None, optional): No description provideddenoising_end(None, optional): No description providedstrength(None, optional, defaults to0.3): No description provideddenoising_start(None, optional): No description providedlatents(None): No description providedimage_latents(Tensor, optional): The latents representing the reference image for image-to-image/inpainting generation. Can be generated in vae_encode step.mask(Tensor, optional): The mask for the inpainting generation. Can be generated in vae_encode step.masked_image_latents(Tensor, optional): The masked image latents for the inpainting generation (only for inpainting-specific unet). Can be generated in vae_encode step.original_size(None, optional): No description providedtarget_size(None, optional): No description providednegative_original_size(None, optional): No description providednegative_target_size(None, optional): No description providedcrops_coords_top_left(None, optional, defaults to(0, 0)): No description providednegative_crops_coords_top_left(None, optional, defaults to(0, 0)): No description providedaesthetic_score(None, optional, defaults to6.0): No description providednegative_aesthetic_score(None, optional, defaults to2.0): No description providedcontrol_image(None, optional): No description providedcontrol_mode(None, optional): No description providedcontrol_guidance_start(None, optional, defaults to0.0): No description providedcontrol_guidance_end(None, optional, defaults to1.0): No description providedcontrolnet_conditioning_scale(None, optional, defaults to1.0): No description providedguess_mode(None, optional, defaults toFalse): No description providedcrops_coords(tuple | NoneType, optional): The crop coordinates to use for preprocess/postprocess the image and mask, for inpainting task only. Can be generated in vae_encode step.controlnet_cond(Tensor, optional): The control image to use for the denoising process. Can be generated in prepare_controlnet_inputs step.conditioning_scale(float, optional): The controlnet conditioning scale value to use for the denoising process. Can be generated in prepare_controlnet_inputs step.controlnet_keep(list, optional): The controlnet keep values to use for the denoising process. Can be generated in prepare_controlnet_inputs step.**denoiser_input_fields(None, optional): All conditional model inputs that need to be prepared with guider. It should contain prompt_embeds/negative_prompt_embeds, add_time_ids/negative_add_time_ids, pooled_prompt_embeds/negative_pooled_prompt_embeds, and ip_adapter_embeds/negative_ip_adapter_embeds (optional).please addkwargs_type=denoiser_input_fieldsto their parameter spec (OutputParam) when they are created and added to the pipeline stateeta(None, optional, defaults to0.0): No description providedoutput_type(None, optional, defaults topil): No description provided
Outputs:
images(list): Generated images.
- Downloads last month
- -