This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.

Pipeline Type: StableDiffusionXLAutoBlocks

Description: Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks using Stable Diffusion XL.

This pipeline uses a 5-block architecture that can be customized and extended.

Example Usage

[TODO]

Pipeline Architecture

This modular pipeline is composed of the following blocks:

  1. text_encoder (StableDiffusionXLTextEncoderStep)
    • Text Encoder step that generate text_embeddings to guide the image generation
  2. ip_adapter (StableDiffusionXLAutoIPAdapterStep)
    • Run IP Adapter step if ip_adapter_image is provided. This step should be placed before the 'input' step.
  3. vae_encoder (StableDiffusionXLAutoVaeEncoderStep)
    • Vae encoder step that encode the image inputs into their latent representations.
  4. denoise (StableDiffusionXLCoreDenoiseStep)
    • Core step that performs the denoising process.
  5. decode (StableDiffusionXLAutoDecodeStep)
    • Decode step that decode the denoised latents into images outputs.

Model Components

  1. text_encoder (CLIPTextModel)
  2. text_encoder_2 (CLIPTextModelWithProjection)
  3. tokenizer (CLIPTokenizer)
  4. tokenizer_2 (CLIPTokenizer)
  5. guider (ClassifierFreeGuidance)
  6. image_encoder (CLIPVisionModelWithProjection)
  7. feature_extractor (CLIPImageProcessor)
  8. unet (UNet2DConditionModel)
  9. vae (AutoencoderKL)
  10. image_processor (VaeImageProcessor)
  11. mask_processor (VaeImageProcessor)
  12. scheduler (EulerDiscreteScheduler)
  13. controlnet (ControlNetUnionModel)
  14. control_image_processor (VaeImageProcessor)

Configuration Parameters

force_zeros_for_empty_prompt (default: True) requires_aesthetics_score (default: False)

Workflow Input Specification

text2image
  • prompt (None, optional): No description provided
image2image
  • prompt (None, optional): No description provided
  • image (None): No description provided
inpainting
  • prompt (None, optional): No description provided
  • image (None): No description provided
  • mask_image (None): No description provided
controlnet_text2image
  • prompt (None, optional): No description provided
  • control_image (None): No description provided
controlnet_image2image
  • prompt (None, optional): No description provided
  • image (None): No description provided
  • control_image (None): No description provided
controlnet_inpainting
  • prompt (None, optional): No description provided
  • image (None): No description provided
  • mask_image (None): No description provided
  • control_image (None): No description provided
controlnet_union_text2image
  • prompt (None, optional): No description provided
  • control_image (None): No description provided
  • control_mode (None): No description provided
controlnet_union_image2image
  • prompt (None, optional): No description provided
  • image (None): No description provided
  • control_image (None): No description provided
  • control_mode (None): No description provided
controlnet_union_inpainting
  • prompt (None, optional): No description provided
  • image (None): No description provided
  • mask_image (None): No description provided
  • control_image (None): No description provided
  • control_mode (None): No description provided
ip_adapter_text2image
  • prompt (None, optional): No description provided
  • ip_adapter_image (Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapter
ip_adapter_image2image
  • prompt (None, optional): No description provided
  • ip_adapter_image (Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapter
  • image (None): No description provided
ip_adapter_inpainting
  • prompt (None, optional): No description provided
  • ip_adapter_image (Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapter
  • image (None): No description provided
  • mask_image (None): No description provided
ip_adapter_controlnet_text2image
  • prompt (None, optional): No description provided
  • ip_adapter_image (Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapter
  • control_image (None): No description provided
ip_adapter_controlnet_image2image
  • prompt (None, optional): No description provided
  • ip_adapter_image (Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapter
  • image (None): No description provided
  • control_image (None): No description provided
ip_adapter_controlnet_inpainting
  • prompt (None, optional): No description provided
  • ip_adapter_image (Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapter
  • image (None): No description provided
  • mask_image (None): No description provided
  • control_image (None): No description provided
ip_adapter_controlnet_union_text2image
  • prompt (None, optional): No description provided
  • ip_adapter_image (Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapter
  • control_image (None): No description provided
  • control_mode (None): No description provided
ip_adapter_controlnet_union_image2image
  • prompt (None, optional): No description provided
  • ip_adapter_image (Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapter
  • image (None): No description provided
  • control_image (None): No description provided
  • control_mode (None): No description provided
ip_adapter_controlnet_union_inpainting
  • prompt (None, optional): No description provided
  • ip_adapter_image (Image | ndarray | Tensor | list | list | list): The image(s) to be used as ip adapter
  • image (None): No description provided
  • mask_image (None): No description provided
  • control_image (None): No description provided
  • control_mode (None): No description provided

Input/Output Specification

Inputs:

  • prompt (None, optional): No description provided
  • prompt_2 (None, optional): No description provided
  • negative_prompt (None, optional): No description provided
  • negative_prompt_2 (None, optional): No description provided
  • cross_attention_kwargs (None, optional): No description provided
  • clip_skip (None, optional): No description provided
  • ip_adapter_image (Image | ndarray | Tensor | list | list | list, optional): The image(s) to be used as ip adapter
  • height (None, optional): No description provided
  • width (None, optional): No description provided
  • image (None, optional): No description provided
  • mask_image (None, optional): No description provided
  • padding_mask_crop (None, optional): No description provided
  • dtype (dtype, optional): The dtype of the model inputs
  • generator (None, optional): No description provided
  • preprocess_kwargs (dict | NoneType, optional): A kwargs dictionary that if specified is passed along to the ImageProcessor as defined under self.image_processor in [diffusers.image_processor.VaeImageProcessor]
  • num_images_per_prompt (None, optional, defaults to 1): No description provided
  • ip_adapter_embeds (list, optional): Pre-generated image embeddings for IP-Adapter. Can be generated from ip_adapter step.
  • negative_ip_adapter_embeds (list, optional): Pre-generated negative image embeddings for IP-Adapter. Can be generated from ip_adapter step.
  • num_inference_steps (None, optional, defaults to 50): No description provided
  • timesteps (None, optional): No description provided
  • sigmas (None, optional): No description provided
  • denoising_end (None, optional): No description provided
  • strength (None, optional, defaults to 0.3): No description provided
  • denoising_start (None, optional): No description provided
  • latents (None): No description provided
  • image_latents (Tensor, optional): The latents representing the reference image for image-to-image/inpainting generation. Can be generated in vae_encode step.
  • mask (Tensor, optional): The mask for the inpainting generation. Can be generated in vae_encode step.
  • masked_image_latents (Tensor, optional): The masked image latents for the inpainting generation (only for inpainting-specific unet). Can be generated in vae_encode step.
  • original_size (None, optional): No description provided
  • target_size (None, optional): No description provided
  • negative_original_size (None, optional): No description provided
  • negative_target_size (None, optional): No description provided
  • crops_coords_top_left (None, optional, defaults to (0, 0)): No description provided
  • negative_crops_coords_top_left (None, optional, defaults to (0, 0)): No description provided
  • aesthetic_score (None, optional, defaults to 6.0): No description provided
  • negative_aesthetic_score (None, optional, defaults to 2.0): No description provided
  • control_image (None, optional): No description provided
  • control_mode (None, optional): No description provided
  • control_guidance_start (None, optional, defaults to 0.0): No description provided
  • control_guidance_end (None, optional, defaults to 1.0): No description provided
  • controlnet_conditioning_scale (None, optional, defaults to 1.0): No description provided
  • guess_mode (None, optional, defaults to False): No description provided
  • crops_coords (tuple | NoneType, optional): The crop coordinates to use for preprocess/postprocess the image and mask, for inpainting task only. Can be generated in vae_encode step.
  • controlnet_cond (Tensor, optional): The control image to use for the denoising process. Can be generated in prepare_controlnet_inputs step.
  • conditioning_scale (float, optional): The controlnet conditioning scale value to use for the denoising process. Can be generated in prepare_controlnet_inputs step.
  • controlnet_keep (list, optional): The controlnet keep values to use for the denoising process. Can be generated in prepare_controlnet_inputs step.
  • **denoiser_input_fields (None, optional): All conditional model inputs that need to be prepared with guider. It should contain prompt_embeds/negative_prompt_embeds, add_time_ids/negative_add_time_ids, pooled_prompt_embeds/negative_pooled_prompt_embeds, and ip_adapter_embeds/negative_ip_adapter_embeds (optional).please add kwargs_type=denoiser_input_fields to their parameter spec (OutputParam) when they are created and added to the pipeline state
  • eta (None, optional, defaults to 0.0): No description provided
  • output_type (None, optional, defaults to pil): No description provided

Outputs:

  • images (list): Generated images.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support