YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SDXL T2I Adapter - Brightness Control

Metadata

license: apache-2.0
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
  - stable-diffusion-xl
  - stable-diffusion-xl-diffusers
  - text-to-image
  - diffusers
  - t2i-adapter
  - brightness
  - sdxl
library_name: diffusers
pipeline_tag: image-to-image
datasets:
  - latentcat/grayscale_image_aesthetic_3M

Overview

A T2I Adapter model trained on Stable Diffusion XL to control image generation through brightness/grayscale information. T2I Adapters provide a lightweight alternative to ControlNet with ~~77M parameters (~~300MB) compared to ~~700M parameters (~~3GB) for ControlNet, enabling much faster training and inference.

🎉 NEW: Now includes 1024×1024 native SDXL resolution model!

Key Features:

🎨 Control brightness and lighting in generated images
🚀 2.7x faster training than ControlNet (~18 minutes vs ~49 minutes @ 1024×1024)
💾 15x smaller model size (~300MB vs ~4.7GB)
🖼️ Two versions: 512×512 and 1024×1024 (native SDXL)
🔄 Compatible with standard SDXL pipelines
💡 Trained on high-quality aesthetic images
🔥 Strong pattern preservation at high conditioning scales

Intended Uses:

Artistic QR code generation (especially with 1024×1024 model at scale 2.0+)
Image recoloring and colorization
Lighting control in text-to-image generation
Brightness-based image manipulation
Photo enhancement and stylization
Watermark and pattern integration

Available Models

This repository contains two model versions:

512×512 Model (Original)

File: diffusion_pytorch_model.safetensors + config.json
Resolution: 512×512
Training: A100 40GB, 10k samples, ~11 minutes
Best for: General purpose, faster inference on limited hardware

1024×1024 Model (Native SDXL - Recommended)

File: diffusion_pytorch_model_1024.safetensors + config_1024.json
Resolution: 1024×1024 (native SDXL!)
Training: H100 80GB, 10k samples, ~18 minutes
Best for: Maximum quality, strong pattern preservation, artistic QR codes
Discovery: Shows superior brightness preservation at conditioning scales 1.5-2.5

Training Details

512×512 Model

Parameter	Value
Base Model	`stabilityai/stable-diffusion-xl-base-1.0`
VAE	`madebyollin/sdxl-vae-fp16-fix`
Training Resolution	512×512
Training Steps	157 (1 epoch)
Batch Size	8 per device
Gradient Accumulation	8 (effective: 64)
Learning Rate	1e-5
Mixed Precision	FP16
Hardware	NVIDIA A100 40GB
Training Time	~11 minutes
Dataset	10,000 samples from grayscale_image_aesthetic_3M

1024×1024 Model (NEW!)

Parameter	Value
Base Model	`stabilityai/stable-diffusion-xl-base-1.0`
VAE	`madebyollin/sdxl-vae-fp16-fix`
Training Resolution	1024×1024 (native SDXL)
Training Steps	157 (1 epoch)
Batch Size	8 per device
Gradient Accumulation	8 (effective: 64)
Learning Rate	1e-5
Mixed Precision	FP16
Hardware	NVIDIA H100 80GB
Training Time	~18 minutes
Final Loss	0.0796
Dataset	10,000 samples from grayscale_image_aesthetic_3M

Installation

pip install diffusers transformers accelerate torch

Usage

Option 1: 1024×1024 Model (Recommended for Best Quality)

from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
import torch
from PIL import Image

# Load T2I Adapter (1024×1024 model)
adapter = T2IAdapter.from_pretrained(
    "Oysiyl/t2i-adapter-brightness-sdxl-10k",
    subfolder=".",
    weight_name="diffusion_pytorch_model_1024.safetensors",
    torch_dtype=torch.float16
)

# Load SDXL pipeline
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    adapter=adapter,
    torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")

# Load grayscale/brightness control image
control_image = Image.open("path/to/grayscale_image.png")
control_image = control_image.resize((1024, 1024))  # Resize to 1024×1024

# Generate image
prompt = "a beautiful landscape, highly detailed, vibrant colors"
negative_prompt = "blurry, low quality, distorted"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=control_image,
    num_inference_steps=30,
    adapter_conditioning_scale=2.0,  # Higher scales work great at 1024×1024!
    guidance_scale=7.5,
    height=1024,
    width=1024,
).images[0]

image.save("output.png")

Option 2: 512×512 Model (Original)

from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
import torch
from PIL import Image

# Load T2I Adapter (512×512 model - default)
adapter = T2IAdapter.from_pretrained(
    "Oysiyl/t2i-adapter-brightness-sdxl-10k",
    torch_dtype=torch.float16
)

# Load SDXL pipeline
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    adapter=adapter,
    torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")

# Load grayscale/brightness control image
control_image = Image.open("path/to/grayscale_image.png")
control_image = control_image.resize((512, 512))

# Generate image
prompt = "a beautiful landscape, highly detailed, vibrant colors"
negative_prompt = "blurry, low quality, distorted"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=control_image,
    num_inference_steps=30,
    adapter_conditioning_scale=0.8,
    guidance_scale=7.5,
    height=512,
    width=512,
).images[0]

image.save("output.png")

Adapter Conditioning Scale Recommendations

The adapter_conditioning_scale parameter controls how strongly the adapter influences the generation:

For 512×512 Model:

0.3-0.5: Weak control, more creative freedom
0.6-0.8: Balanced control (recommended)
0.9-1.2: Strong control, closely follows brightness structure
1.3-2.0: Very strong control, minimal deviation

For 1024×1024 Model (NEW Findings!):

0.7-1.0: Artistic integration with subtle pattern hints
1.0-1.5: Balanced - visible structure with artistic elements
1.5-2.0: 🔥 EXCELLENT pattern preservation (recommended for QR codes!)
2.0-2.5: Maximum control - strong brightness patterns with artistic overlay

Discovery: The 1024×1024 model shows superior brightness preservation at scales 1.5-2.5, making it ideal for:

Artistic QR codes (scale 2.0 recommended)
Watermark integration
Pattern-based generation
Strong lighting control

Performance Comparison

1024×1024 vs 512×512 Model

Metric	512×512 Model	1024×1024 Model
Resolution	512×512	1024×1024 (native SDXL)
Training Time	~11 min (A100)	~18 min (H100)
Inference Speed	~12 it/s @ 512	~12 it/s @ 1024
Pattern Preservation	Good	Excellent at high scales
Best Scale Range	0.6-1.2	1.5-2.5
Use Case	General purpose	High quality, patterns, QR codes

T2I Adapter vs ControlNet (1024×1024)

Feature	T2I Adapter (1024)	ControlNet (512)
Parameters	~77M	~700M
Model Size	302MB	4.7GB
Training Samples	10,000 @ 1024	100,000 @ 512
Training Time	~18 minutes	~49 minutes
Resolution	1024×1024 (native)	512×512 (upscaled)
Inference Speed @ 1024	~12 it/s	~8 it/s
Time per Image	~2.5 seconds	~4 seconds
Pattern Preservation @ Scale 2.0	Excellent	Good

Winner for patterns: 🏆 T2I Adapter 1024×1024 at scale 2.0 shows stronger brightness pattern preservation than ControlNet!

When to Use Each Model

Use 512×512 Model when:

✅ Limited VRAM (works on 8GB+ GPUs)
✅ Faster inference needed
✅ General brightness control
✅ Natural image generation

Use 1024×1024 Model when:

✅ Maximum quality required
✅ Artistic QR code generation
✅ Strong pattern preservation needed (scale 1.5-2.5)
✅ Native SDXL resolution preferred
✅ 16GB+ VRAM available

Use ControlNet when:

✅ Sub-1.5 conditioning scales with precise control
✅ Complex geometric precision at low scales
✅ Production applications with strict requirements

Key Findings

🎉 Major Discovery: The T2I Adapter trained at 1024×1024 resolution demonstrates superior brightness pattern preservation compared to the 512×512 version and even ControlNet at higher conditioning scales (1.5-2.5).

Evidence:

At scale 2.0, the 1024×1024 model preserves strong black/white patterns from input
Successfully maintains QR code structure while adding artistic elements
Shows chaotic but controlled pattern integration
Training at native SDXL resolution (1024×1024) provides better feature learning

Implications:

✅ Ideal for artistic QR codes at scale 2.0
✅ Excellent for brightness-based pattern control
✅ Faster and smaller than ControlNet with comparable/better pattern preservation
✅ Native SDXL resolution avoids upscaling artifacts

Checkpoints

This repository includes multiple checkpoints:

512×512 Model Checkpoints:

checkpoint-78/ - Middle checkpoint (~5,000 samples)
checkpoint-156/ - Near-final checkpoint (~10,000 samples)
Final model (diffusion_pytorch_model.safetensors) - Complete training

1024×1024 Model:

Final model (diffusion_pytorch_model_1024.safetensors) - Complete training at native SDXL resolution

Compare checkpoint quality to choose the best for your use case.

Example Use Cases

Artistic QR Code Generation (1024×1024 @ scale 2.0)

import qrcode
from PIL import Image

# Generate QR code
qr = qrcode.QRCode(error_correction=qrcode.constants.ERROR_CORRECT_H)
qr.add_data("https://your-url.com")
qr_image = qr.make_image().resize((1024, 1024)).convert("RGB")

# Generate artistic QR with T2I Adapter
image = pipe(
    prompt="beautiful garden with flowers and butterflies",
    image=qr_image,
    adapter_conditioning_scale=2.0,  # Strong pattern preservation!
    height=1024,
    width=1024,
).images[0]

Brightness-Based Image Manipulation

# Convert photo to grayscale for brightness control
control = Image.open("photo.jpg").convert("L").convert("RGB").resize((1024, 1024))

# Recolor with new style
image = pipe(
    prompt="vibrant sunset colors, golden hour lighting",
    image=control,
    adapter_conditioning_scale=1.5,
    height=1024,
    width=1024,
).images[0]

Limitations

⚠️ Training Efficiency Trade-offs:

Trained on only 10k samples (vs 100k for full ControlNet)
Single epoch training for rapid deployment
Best results at higher scales (1.5-2.5) for 1024×1024 model

For production use, consider:

Fine-tuning on domain-specific data
Training for multiple epochs for even better quality
Combining with other conditioning methods

Citation

@misc{t2i-adapter-brightness-sdxl-10k,
  author = {Oysiyl},
  title = {SDXL T2I Adapter - Brightness Control (10k samples, 512 & 1024)},
  year = {2025},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/Oysiyl/t2i-adapter-brightness-sdxl-10k}}
}

License

Apache 2.0 License. Base SDXL model has separate license terms at stabilityai/stable-diffusion-xl-base-1.0

Training Code

This model was trained using the official Diffusers T2I Adapter training script with the following key configurations:

FP16 mixed precision training
xFormers memory-efficient attention
8-bit Adam optimizer
Gradient checkpointing for memory efficiency
MinSNR loss weighting (gamma=5.0)
FP16-fixed VAE for numerical stability

The 1024×1024 model was trained on H100 80GB without OOM errors, demonstrating excellent memory efficiency.

See the training script for implementation details.

Acknowledgments

Trained on the latentcat/grayscale_image_aesthetic_3M dataset
Built with 🤗 Diffusers
Base model: Stable Diffusion XL

Downloads last month: 18

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support