DDPO LoRA checkpoints: RMS contrast reward run
This repository contains checkpoints from a completed DDPO training run optimized for RMS contrast (luminance standard deviation).
Snapshot Scope
- Checkpoints included:
0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 99 - 100 epochs total (run split across two job submissions on torralba-3090-4).
Reward Definition (exact)
Reward function name: rms_contrast in ddpo_pytorch/rewards.py.
Implementation details:
- Convert generated images to uint8 NumPy arrays (
NHWC, values in[0,255]). - Compute grayscale luminance:
gray = 0.299*R + 0.587*G + 0.114*B. - Compute reward as
gray.std(axis=(1,2))per image (higher = more contrast). - Return scores as
float32array (higher is better).
Inference Example
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16,
).to("cuda")
pipe.load_lora_weights(
"giannisdaras/ddpo-rms-contrast-checkpoints/checkpoints/checkpoint_99",
weight_name="pytorch_lora_weights.bin",
)
image = pipe(
"a photo of a sea turtle underwater",
num_inference_steps=30,
guidance_scale=5.0,
).images[0]
image.save("sample.png")
Model tree for giannisdaras/ddpo-rms-contrast-checkpoints
Base model
CompVis/stable-diffusion-v1-4