YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
SDXL T2I Adapter - Brightness Control
Metadata
license: apache-2.0
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
- stable-diffusion-xl
- stable-diffusion-xl-diffusers
- text-to-image
- diffusers
- t2i-adapter
- brightness
- sdxl
library_name: diffusers
pipeline_tag: image-to-image
datasets:
- latentcat/grayscale_image_aesthetic_3M
Overview
A T2I Adapter model trained on Stable Diffusion XL to control image generation through brightness/grayscale information. T2I Adapters provide a lightweight alternative to ControlNet with 77M parameters (300MB) compared to 700M parameters (3GB) for ControlNet, enabling much faster training and inference.
π NEW: Now includes 1024Γ1024 native SDXL resolution model!
Key Features:
- π¨ Control brightness and lighting in generated images
- π 2.7x faster training than ControlNet (~18 minutes vs ~49 minutes @ 1024Γ1024)
- πΎ 15x smaller model size (~300MB vs ~4.7GB)
- πΌοΈ Two versions: 512Γ512 and 1024Γ1024 (native SDXL)
- π Compatible with standard SDXL pipelines
- π‘ Trained on high-quality aesthetic images
- π₯ Strong pattern preservation at high conditioning scales
Intended Uses:
- Artistic QR code generation (especially with 1024Γ1024 model at scale 2.0+)
- Image recoloring and colorization
- Lighting control in text-to-image generation
- Brightness-based image manipulation
- Photo enhancement and stylization
- Watermark and pattern integration
Available Models
This repository contains two model versions:
512Γ512 Model (Original)
- File:
diffusion_pytorch_model.safetensors+config.json - Resolution: 512Γ512
- Training: A100 40GB, 10k samples, ~11 minutes
- Best for: General purpose, faster inference on limited hardware
1024Γ1024 Model (Native SDXL - Recommended)
- File:
diffusion_pytorch_model_1024.safetensors+config_1024.json - Resolution: 1024Γ1024 (native SDXL!)
- Training: H100 80GB, 10k samples, ~18 minutes
- Best for: Maximum quality, strong pattern preservation, artistic QR codes
- Discovery: Shows superior brightness preservation at conditioning scales 1.5-2.5
Training Details
512Γ512 Model
| Parameter | Value |
|---|---|
| Base Model | stabilityai/stable-diffusion-xl-base-1.0 |
| VAE | madebyollin/sdxl-vae-fp16-fix |
| Training Resolution | 512Γ512 |
| Training Steps | 157 (1 epoch) |
| Batch Size | 8 per device |
| Gradient Accumulation | 8 (effective: 64) |
| Learning Rate | 1e-5 |
| Mixed Precision | FP16 |
| Hardware | NVIDIA A100 40GB |
| Training Time | ~11 minutes |
| Dataset | 10,000 samples from grayscale_image_aesthetic_3M |
1024Γ1024 Model (NEW!)
| Parameter | Value |
|---|---|
| Base Model | stabilityai/stable-diffusion-xl-base-1.0 |
| VAE | madebyollin/sdxl-vae-fp16-fix |
| Training Resolution | 1024Γ1024 (native SDXL) |
| Training Steps | 157 (1 epoch) |
| Batch Size | 8 per device |
| Gradient Accumulation | 8 (effective: 64) |
| Learning Rate | 1e-5 |
| Mixed Precision | FP16 |
| Hardware | NVIDIA H100 80GB |
| Training Time | ~18 minutes |
| Final Loss | 0.0796 |
| Dataset | 10,000 samples from grayscale_image_aesthetic_3M |
Installation
pip install diffusers transformers accelerate torch
Usage
Option 1: 1024Γ1024 Model (Recommended for Best Quality)
from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
import torch
from PIL import Image
# Load T2I Adapter (1024Γ1024 model)
adapter = T2IAdapter.from_pretrained(
"Oysiyl/t2i-adapter-brightness-sdxl-10k",
subfolder=".",
weight_name="diffusion_pytorch_model_1024.safetensors",
torch_dtype=torch.float16
)
# Load SDXL pipeline
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
adapter=adapter,
torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")
# Load grayscale/brightness control image
control_image = Image.open("path/to/grayscale_image.png")
control_image = control_image.resize((1024, 1024)) # Resize to 1024Γ1024
# Generate image
prompt = "a beautiful landscape, highly detailed, vibrant colors"
negative_prompt = "blurry, low quality, distorted"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=control_image,
num_inference_steps=30,
adapter_conditioning_scale=2.0, # Higher scales work great at 1024Γ1024!
guidance_scale=7.5,
height=1024,
width=1024,
).images[0]
image.save("output.png")
Option 2: 512Γ512 Model (Original)
from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
import torch
from PIL import Image
# Load T2I Adapter (512Γ512 model - default)
adapter = T2IAdapter.from_pretrained(
"Oysiyl/t2i-adapter-brightness-sdxl-10k",
torch_dtype=torch.float16
)
# Load SDXL pipeline
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
adapter=adapter,
torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")
# Load grayscale/brightness control image
control_image = Image.open("path/to/grayscale_image.png")
control_image = control_image.resize((512, 512))
# Generate image
prompt = "a beautiful landscape, highly detailed, vibrant colors"
negative_prompt = "blurry, low quality, distorted"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=control_image,
num_inference_steps=30,
adapter_conditioning_scale=0.8,
guidance_scale=7.5,
height=512,
width=512,
).images[0]
image.save("output.png")
Adapter Conditioning Scale Recommendations
The adapter_conditioning_scale parameter controls how strongly the adapter influences the generation:
For 512Γ512 Model:
- 0.3-0.5: Weak control, more creative freedom
- 0.6-0.8: Balanced control (recommended)
- 0.9-1.2: Strong control, closely follows brightness structure
- 1.3-2.0: Very strong control, minimal deviation
For 1024Γ1024 Model (NEW Findings!):
- 0.7-1.0: Artistic integration with subtle pattern hints
- 1.0-1.5: Balanced - visible structure with artistic elements
- 1.5-2.0: π₯ EXCELLENT pattern preservation (recommended for QR codes!)
- 2.0-2.5: Maximum control - strong brightness patterns with artistic overlay
Discovery: The 1024Γ1024 model shows superior brightness preservation at scales 1.5-2.5, making it ideal for:
- Artistic QR codes (scale 2.0 recommended)
- Watermark integration
- Pattern-based generation
- Strong lighting control
Performance Comparison
1024Γ1024 vs 512Γ512 Model
| Metric | 512Γ512 Model | 1024Γ1024 Model |
|---|---|---|
| Resolution | 512Γ512 | 1024Γ1024 (native SDXL) |
| Training Time | ~11 min (A100) | ~18 min (H100) |
| Inference Speed | ~12 it/s @ 512 | ~12 it/s @ 1024 |
| Pattern Preservation | Good | Excellent at high scales |
| Best Scale Range | 0.6-1.2 | 1.5-2.5 |
| Use Case | General purpose | High quality, patterns, QR codes |
T2I Adapter vs ControlNet (1024Γ1024)
| Feature | T2I Adapter (1024) | ControlNet (512) |
|---|---|---|
| Parameters | ~77M | ~700M |
| Model Size | 302MB | 4.7GB |
| Training Samples | 10,000 @ 1024 | 100,000 @ 512 |
| Training Time | ~18 minutes | ~49 minutes |
| Resolution | 1024Γ1024 (native) | 512Γ512 (upscaled) |
| Inference Speed @ 1024 | ~12 it/s | ~8 it/s |
| Time per Image | ~2.5 seconds | ~4 seconds |
| Pattern Preservation @ Scale 2.0 | Excellent | Good |
Winner for patterns: π T2I Adapter 1024Γ1024 at scale 2.0 shows stronger brightness pattern preservation than ControlNet!
When to Use Each Model
Use 512Γ512 Model when:
- β Limited VRAM (works on 8GB+ GPUs)
- β Faster inference needed
- β General brightness control
- β Natural image generation
Use 1024Γ1024 Model when:
- β Maximum quality required
- β Artistic QR code generation
- β Strong pattern preservation needed (scale 1.5-2.5)
- β Native SDXL resolution preferred
- β 16GB+ VRAM available
Use ControlNet when:
- β Sub-1.5 conditioning scales with precise control
- β Complex geometric precision at low scales
- β Production applications with strict requirements
Key Findings
π Major Discovery: The T2I Adapter trained at 1024Γ1024 resolution demonstrates superior brightness pattern preservation compared to the 512Γ512 version and even ControlNet at higher conditioning scales (1.5-2.5).
Evidence:
- At scale 2.0, the 1024Γ1024 model preserves strong black/white patterns from input
- Successfully maintains QR code structure while adding artistic elements
- Shows chaotic but controlled pattern integration
- Training at native SDXL resolution (1024Γ1024) provides better feature learning
Implications:
- β Ideal for artistic QR codes at scale 2.0
- β Excellent for brightness-based pattern control
- β Faster and smaller than ControlNet with comparable/better pattern preservation
- β Native SDXL resolution avoids upscaling artifacts
Checkpoints
This repository includes multiple checkpoints:
512Γ512 Model Checkpoints:
- checkpoint-78/ - Middle checkpoint (~5,000 samples)
- checkpoint-156/ - Near-final checkpoint (~10,000 samples)
- Final model (
diffusion_pytorch_model.safetensors) - Complete training
1024Γ1024 Model:
- Final model (
diffusion_pytorch_model_1024.safetensors) - Complete training at native SDXL resolution
Compare checkpoint quality to choose the best for your use case.
Example Use Cases
Artistic QR Code Generation (1024Γ1024 @ scale 2.0)
import qrcode
from PIL import Image
# Generate QR code
qr = qrcode.QRCode(error_correction=qrcode.constants.ERROR_CORRECT_H)
qr.add_data("https://your-url.com")
qr_image = qr.make_image().resize((1024, 1024)).convert("RGB")
# Generate artistic QR with T2I Adapter
image = pipe(
prompt="beautiful garden with flowers and butterflies",
image=qr_image,
adapter_conditioning_scale=2.0, # Strong pattern preservation!
height=1024,
width=1024,
).images[0]
Brightness-Based Image Manipulation
# Convert photo to grayscale for brightness control
control = Image.open("photo.jpg").convert("L").convert("RGB").resize((1024, 1024))
# Recolor with new style
image = pipe(
prompt="vibrant sunset colors, golden hour lighting",
image=control,
adapter_conditioning_scale=1.5,
height=1024,
width=1024,
).images[0]
Limitations
β οΈ Training Efficiency Trade-offs:
- Trained on only 10k samples (vs 100k for full ControlNet)
- Single epoch training for rapid deployment
- Best results at higher scales (1.5-2.5) for 1024Γ1024 model
For production use, consider:
- Fine-tuning on domain-specific data
- Training for multiple epochs for even better quality
- Combining with other conditioning methods
Citation
@misc{t2i-adapter-brightness-sdxl-10k,
author = {Oysiyl},
title = {SDXL T2I Adapter - Brightness Control (10k samples, 512 & 1024)},
year = {2025},
publisher = {HuggingFace},
journal = {HuggingFace Model Hub},
howpublished = {\url{https://huggingface.co/Oysiyl/t2i-adapter-brightness-sdxl-10k}}
}
License
Apache 2.0 License. Base SDXL model has separate license terms at stabilityai/stable-diffusion-xl-base-1.0
Training Code
This model was trained using the official Diffusers T2I Adapter training script with the following key configurations:
- FP16 mixed precision training
- xFormers memory-efficient attention
- 8-bit Adam optimizer
- Gradient checkpointing for memory efficiency
- MinSNR loss weighting (gamma=5.0)
- FP16-fixed VAE for numerical stability
The 1024Γ1024 model was trained on H100 80GB without OOM errors, demonstrating excellent memory efficiency.
See the training script for implementation details.
Acknowledgments
- Trained on the latentcat/grayscale_image_aesthetic_3M dataset
- Built with π€ Diffusers
- Base model: Stable Diffusion XL
- Downloads last month
- 18