YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SDXL T2I Adapter - Brightness Control

Metadata

license: apache-2.0
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
  - stable-diffusion-xl
  - stable-diffusion-xl-diffusers
  - text-to-image
  - diffusers
  - t2i-adapter
  - brightness
  - sdxl
library_name: diffusers
pipeline_tag: image-to-image
datasets:
  - latentcat/grayscale_image_aesthetic_3M

Overview

A T2I Adapter model trained on Stable Diffusion XL to control image generation through brightness/grayscale information. T2I Adapters provide a lightweight alternative to ControlNet with 77M parameters (300MB) compared to 700M parameters (3GB) for ControlNet, enabling much faster training and inference.

πŸŽ‰ NEW: Now includes 1024Γ—1024 native SDXL resolution model!

Key Features:

  • 🎨 Control brightness and lighting in generated images
  • πŸš€ 2.7x faster training than ControlNet (~18 minutes vs ~49 minutes @ 1024Γ—1024)
  • πŸ’Ύ 15x smaller model size (~300MB vs ~4.7GB)
  • πŸ–ΌοΈ Two versions: 512Γ—512 and 1024Γ—1024 (native SDXL)
  • πŸ”„ Compatible with standard SDXL pipelines
  • πŸ’‘ Trained on high-quality aesthetic images
  • πŸ”₯ Strong pattern preservation at high conditioning scales

Intended Uses:

  • Artistic QR code generation (especially with 1024Γ—1024 model at scale 2.0+)
  • Image recoloring and colorization
  • Lighting control in text-to-image generation
  • Brightness-based image manipulation
  • Photo enhancement and stylization
  • Watermark and pattern integration

Available Models

This repository contains two model versions:

512Γ—512 Model (Original)

  • File: diffusion_pytorch_model.safetensors + config.json
  • Resolution: 512Γ—512
  • Training: A100 40GB, 10k samples, ~11 minutes
  • Best for: General purpose, faster inference on limited hardware

1024Γ—1024 Model (Native SDXL - Recommended)

  • File: diffusion_pytorch_model_1024.safetensors + config_1024.json
  • Resolution: 1024Γ—1024 (native SDXL!)
  • Training: H100 80GB, 10k samples, ~18 minutes
  • Best for: Maximum quality, strong pattern preservation, artistic QR codes
  • Discovery: Shows superior brightness preservation at conditioning scales 1.5-2.5

Training Details

512Γ—512 Model

Parameter Value
Base Model stabilityai/stable-diffusion-xl-base-1.0
VAE madebyollin/sdxl-vae-fp16-fix
Training Resolution 512Γ—512
Training Steps 157 (1 epoch)
Batch Size 8 per device
Gradient Accumulation 8 (effective: 64)
Learning Rate 1e-5
Mixed Precision FP16
Hardware NVIDIA A100 40GB
Training Time ~11 minutes
Dataset 10,000 samples from grayscale_image_aesthetic_3M

1024Γ—1024 Model (NEW!)

Parameter Value
Base Model stabilityai/stable-diffusion-xl-base-1.0
VAE madebyollin/sdxl-vae-fp16-fix
Training Resolution 1024Γ—1024 (native SDXL)
Training Steps 157 (1 epoch)
Batch Size 8 per device
Gradient Accumulation 8 (effective: 64)
Learning Rate 1e-5
Mixed Precision FP16
Hardware NVIDIA H100 80GB
Training Time ~18 minutes
Final Loss 0.0796
Dataset 10,000 samples from grayscale_image_aesthetic_3M

Installation

pip install diffusers transformers accelerate torch

Usage

Option 1: 1024Γ—1024 Model (Recommended for Best Quality)

from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
import torch
from PIL import Image

# Load T2I Adapter (1024Γ—1024 model)
adapter = T2IAdapter.from_pretrained(
    "Oysiyl/t2i-adapter-brightness-sdxl-10k",
    subfolder=".",
    weight_name="diffusion_pytorch_model_1024.safetensors",
    torch_dtype=torch.float16
)

# Load SDXL pipeline
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    adapter=adapter,
    torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")

# Load grayscale/brightness control image
control_image = Image.open("path/to/grayscale_image.png")
control_image = control_image.resize((1024, 1024))  # Resize to 1024Γ—1024

# Generate image
prompt = "a beautiful landscape, highly detailed, vibrant colors"
negative_prompt = "blurry, low quality, distorted"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=control_image,
    num_inference_steps=30,
    adapter_conditioning_scale=2.0,  # Higher scales work great at 1024Γ—1024!
    guidance_scale=7.5,
    height=1024,
    width=1024,
).images[0]

image.save("output.png")

Option 2: 512Γ—512 Model (Original)

from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
import torch
from PIL import Image

# Load T2I Adapter (512Γ—512 model - default)
adapter = T2IAdapter.from_pretrained(
    "Oysiyl/t2i-adapter-brightness-sdxl-10k",
    torch_dtype=torch.float16
)

# Load SDXL pipeline
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    adapter=adapter,
    torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")

# Load grayscale/brightness control image
control_image = Image.open("path/to/grayscale_image.png")
control_image = control_image.resize((512, 512))

# Generate image
prompt = "a beautiful landscape, highly detailed, vibrant colors"
negative_prompt = "blurry, low quality, distorted"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=control_image,
    num_inference_steps=30,
    adapter_conditioning_scale=0.8,
    guidance_scale=7.5,
    height=512,
    width=512,
).images[0]

image.save("output.png")

Adapter Conditioning Scale Recommendations

The adapter_conditioning_scale parameter controls how strongly the adapter influences the generation:

For 512Γ—512 Model:

  • 0.3-0.5: Weak control, more creative freedom
  • 0.6-0.8: Balanced control (recommended)
  • 0.9-1.2: Strong control, closely follows brightness structure
  • 1.3-2.0: Very strong control, minimal deviation

For 1024Γ—1024 Model (NEW Findings!):

  • 0.7-1.0: Artistic integration with subtle pattern hints
  • 1.0-1.5: Balanced - visible structure with artistic elements
  • 1.5-2.0: πŸ”₯ EXCELLENT pattern preservation (recommended for QR codes!)
  • 2.0-2.5: Maximum control - strong brightness patterns with artistic overlay

Discovery: The 1024Γ—1024 model shows superior brightness preservation at scales 1.5-2.5, making it ideal for:

  • Artistic QR codes (scale 2.0 recommended)
  • Watermark integration
  • Pattern-based generation
  • Strong lighting control

Performance Comparison

1024Γ—1024 vs 512Γ—512 Model

Metric 512Γ—512 Model 1024Γ—1024 Model
Resolution 512Γ—512 1024Γ—1024 (native SDXL)
Training Time ~11 min (A100) ~18 min (H100)
Inference Speed ~12 it/s @ 512 ~12 it/s @ 1024
Pattern Preservation Good Excellent at high scales
Best Scale Range 0.6-1.2 1.5-2.5
Use Case General purpose High quality, patterns, QR codes

T2I Adapter vs ControlNet (1024Γ—1024)

Feature T2I Adapter (1024) ControlNet (512)
Parameters ~77M ~700M
Model Size 302MB 4.7GB
Training Samples 10,000 @ 1024 100,000 @ 512
Training Time ~18 minutes ~49 minutes
Resolution 1024Γ—1024 (native) 512Γ—512 (upscaled)
Inference Speed @ 1024 ~12 it/s ~8 it/s
Time per Image ~2.5 seconds ~4 seconds
Pattern Preservation @ Scale 2.0 Excellent Good

Winner for patterns: πŸ† T2I Adapter 1024Γ—1024 at scale 2.0 shows stronger brightness pattern preservation than ControlNet!

When to Use Each Model

Use 512Γ—512 Model when:

  • βœ… Limited VRAM (works on 8GB+ GPUs)
  • βœ… Faster inference needed
  • βœ… General brightness control
  • βœ… Natural image generation

Use 1024Γ—1024 Model when:

  • βœ… Maximum quality required
  • βœ… Artistic QR code generation
  • βœ… Strong pattern preservation needed (scale 1.5-2.5)
  • βœ… Native SDXL resolution preferred
  • βœ… 16GB+ VRAM available

Use ControlNet when:

  • βœ… Sub-1.5 conditioning scales with precise control
  • βœ… Complex geometric precision at low scales
  • βœ… Production applications with strict requirements

Key Findings

πŸŽ‰ Major Discovery: The T2I Adapter trained at 1024Γ—1024 resolution demonstrates superior brightness pattern preservation compared to the 512Γ—512 version and even ControlNet at higher conditioning scales (1.5-2.5).

Evidence:

  • At scale 2.0, the 1024Γ—1024 model preserves strong black/white patterns from input
  • Successfully maintains QR code structure while adding artistic elements
  • Shows chaotic but controlled pattern integration
  • Training at native SDXL resolution (1024Γ—1024) provides better feature learning

Implications:

  • βœ… Ideal for artistic QR codes at scale 2.0
  • βœ… Excellent for brightness-based pattern control
  • βœ… Faster and smaller than ControlNet with comparable/better pattern preservation
  • βœ… Native SDXL resolution avoids upscaling artifacts

Checkpoints

This repository includes multiple checkpoints:

512Γ—512 Model Checkpoints:

  1. checkpoint-78/ - Middle checkpoint (~5,000 samples)
  2. checkpoint-156/ - Near-final checkpoint (~10,000 samples)
  3. Final model (diffusion_pytorch_model.safetensors) - Complete training

1024Γ—1024 Model:

  1. Final model (diffusion_pytorch_model_1024.safetensors) - Complete training at native SDXL resolution

Compare checkpoint quality to choose the best for your use case.

Example Use Cases

Artistic QR Code Generation (1024Γ—1024 @ scale 2.0)

import qrcode
from PIL import Image

# Generate QR code
qr = qrcode.QRCode(error_correction=qrcode.constants.ERROR_CORRECT_H)
qr.add_data("https://your-url.com")
qr_image = qr.make_image().resize((1024, 1024)).convert("RGB")

# Generate artistic QR with T2I Adapter
image = pipe(
    prompt="beautiful garden with flowers and butterflies",
    image=qr_image,
    adapter_conditioning_scale=2.0,  # Strong pattern preservation!
    height=1024,
    width=1024,
).images[0]

Brightness-Based Image Manipulation

# Convert photo to grayscale for brightness control
control = Image.open("photo.jpg").convert("L").convert("RGB").resize((1024, 1024))

# Recolor with new style
image = pipe(
    prompt="vibrant sunset colors, golden hour lighting",
    image=control,
    adapter_conditioning_scale=1.5,
    height=1024,
    width=1024,
).images[0]

Limitations

⚠️ Training Efficiency Trade-offs:

  • Trained on only 10k samples (vs 100k for full ControlNet)
  • Single epoch training for rapid deployment
  • Best results at higher scales (1.5-2.5) for 1024Γ—1024 model

For production use, consider:

  • Fine-tuning on domain-specific data
  • Training for multiple epochs for even better quality
  • Combining with other conditioning methods

Citation

@misc{t2i-adapter-brightness-sdxl-10k,
  author = {Oysiyl},
  title = {SDXL T2I Adapter - Brightness Control (10k samples, 512 & 1024)},
  year = {2025},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/Oysiyl/t2i-adapter-brightness-sdxl-10k}}
}

License

Apache 2.0 License. Base SDXL model has separate license terms at stabilityai/stable-diffusion-xl-base-1.0

Training Code

This model was trained using the official Diffusers T2I Adapter training script with the following key configurations:

  • FP16 mixed precision training
  • xFormers memory-efficient attention
  • 8-bit Adam optimizer
  • Gradient checkpointing for memory efficiency
  • MinSNR loss weighting (gamma=5.0)
  • FP16-fixed VAE for numerical stability

The 1024Γ—1024 model was trained on H100 80GB without OOM errors, demonstrating excellent memory efficiency.

See the training script for implementation details.

Acknowledgments

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support