DDPO LoRA checkpoints: RMS contrast reward run

This repository contains checkpoints from a completed DDPO training run optimized for RMS contrast (luminance standard deviation).

Snapshot Scope

Checkpoints included: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 99
100 epochs total (run split across two job submissions on torralba-3090-4).

Reward Definition (exact)

Reward function name: rms_contrast in ddpo_pytorch/rewards.py.

Implementation details:

Convert generated images to uint8 NumPy arrays (NHWC, values in [0,255]).
Compute grayscale luminance: gray = 0.299*R + 0.587*G + 0.114*B.
Compute reward as gray.std(axis=(1,2)) per image (higher = more contrast).
Return scores as float32 array (higher is better).

Inference Example

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
).to("cuda")

pipe.load_lora_weights(
    "giannisdaras/ddpo-rms-contrast-checkpoints/checkpoints/checkpoint_99",
    weight_name="pytorch_lora_weights.bin",
)

image = pipe(
    "a photo of a sea turtle underwater",
    num_inference_steps=30,
    guidance_scale=5.0,
).images[0]
image.save("sample.png")

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Model tree for giannisdaras/ddpo-rms-contrast-checkpoints

Base model

CompVis/stable-diffusion-v1-4

Adapter

(585)

this model