---
license: cc-by-nc-sa-4.0
language:
  - en
tags:
  - diffusion
  - diffusers
  - stable-diffusion
  - instructpix2pix
  - image-to-image
  - image-editing
library_name: diffusers
pipeline_tag: image-to-image
base_model: runwayml/stable-diffusion-v1-5
datasets:
  - nishitanand/image-relighting-diffusion-data
---

# Learning Illumination Control in Diffusion Models — Model weights (HF)

Weights for [*Learning Illumination Control in Diffusion Models*](https://arxiv.org/abs/2604.24877) (ReALM-GEN @ ICLR 2026).

| | |
|--|--|
| **Code** | [github.com/nishitanand/image-relighting-diffusion](https://github.com/nishitanand/image-relighting-diffusion) |
| **Dataset** | [huggingface.co/datasets/nishitanand/image-relighting-diffusion-data](https://huggingface.co/datasets/nishitanand/image-relighting-diffusion-data) |
| **Paper** | [arxiv.org/abs/2604.24877](https://arxiv.org/abs/2604.24877) |
| **Project site** | [nishitanand.github.io/relighting-diffusion-website](https://nishitanand.github.io/relighting-diffusion-website) |

- **Architecture:** `StableDiffusionInstructPix2PixPipeline` (SD 1.5)  
- **Inputs:** RGB image (512² domain) + instruction string  
- **Output:** RGB relit image  

---

## Download weights (CLI)

```bash
pip install -U "huggingface_hub[cli]"

huggingface-cli download nishitanand/sd-image-relighting-model \
  --repo-type model \
  --local-dir ./sd-image-relighting-model
```

Or in Python / Diffusers, load directly by repo id (no local dir required).

---

## Inference (Python)

```python
import torch
from PIL import Image
from diffusers import StableDiffusionInstructPix2PixPipeline

model_id = "nishitanand/sd-image-relighting-model"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
).to("cuda")
pipe.safety_checker = None

image = Image.open("degraded_input.png").convert("RGB")
instruction = (
    "Soft window light from the left with warm highlights and gentle shadows "
    "across the face."
)

out = pipe(
    instruction,
    image=image,
    num_inference_steps=50,
    image_guidance_scale=1.5,
    guidance_scale=7.5,
).images[0]

out.save("relit.png")
```

**Tips**

- Use prompts in the same **long descriptive** style as training (see dataset / paper).  
- Input should resemble the **synthetic degraded** domain the model was trained on for best identity preservation.

---

## Inference (project script)

After cloning the code repository:

```bash
cd inference
pip install -r requirements.txt

python inference.py \
  --model_path nishitanand/sd-image-relighting-model \
  --input_image /path/to/degraded.png \
  --instruction "Your full-sentence lighting description here." \
  --output_path ./relit.png
```

For full training dependencies, you can instead use `pip install -r training/sd1_5/requirements.txt`.

## Repository layout (this HF model repo)

At the **root** of the snapshot you should see a standard Diffusers tree, for example:

- `model_index.json`  
- `unet/`, `vae/`, `text_encoder/`, `tokenizer/`  
- `scheduler/`, `feature_extractor/`, `safety_checker/`  

**Optional:** `checkpoint-*` (e.g. `checkpoint-13000/`) — raw **Accelerate** training snapshot. **`from_pretrained` / inference / `evaluate_models.py`** load the **repository root** where `model_index.json` lives — they do **not** automatically use `checkpoint-*`. Point `--model_path` / `--trained_model_path` at **`./hf_model`** (or the Hub id), not at `checkpoint-*`, unless you have re-exported that step as a standalone Diffusers layout.

---

## Limitations

- Best on **face** imagery similar to training; strong out-of-domain poses or resolutions may fail.  
- Not a physically-based renderer; extreme prompts can cause drift.

---

## Citation (BibTeX)

```bibtex
@article{anand2026learning,
  title={Learning Illumination Control in Diffusion Models},
  author={Anand, Nishit and Suri, Manan and Metzler, Christopher and Manocha, Dinesh and Duraiswami, Ramani},
  journal={arXiv preprint arXiv:2604.24877},
  year={2026},
  note={ReALM-GEN @ ICLR 2026}
}
```

---

## License

This model repository is released under [**CC BY-NC-SA 4.0**](https://creativecommons.org/licenses/by-nc-sa/4.0/) (see [`LICENSE`](LICENSE) at the repo root when mirrored from GitHub). Base **Stable Diffusion** components remain subject to their original licenses.

**FFHQ.** Weights were trained on data derived from **Flickr-Faces-HQ (FFHQ)**. The FFHQ dataset package is distributed by NVIDIA under **CC BY-NC-SA 4.0**; individual source images carry their own Flickr licenses. Comply with [NVlabs/ffhq-dataset](https://github.com/NVlabs/ffhq-dataset) and cite *A Style-Based Generator Architecture for Generative Adversarial Networks* (Karras, Laine, Aila) and FFHQ/NVIDIA as required.

When you publish results using these weights, cite [**arXiv:2604.24877**](https://arxiv.org/abs/2604.24877) and link the **code**, **dataset**, and **this model** repository.