Upload folder using huggingface_hub

1738d0e verified 11 days ago

4.96 kB

	---
	license: cc-by-nc-sa-4.0
	language:
	- en
	tags:
	- diffusion
	- diffusers
	- stable-diffusion
	- instructpix2pix
	- image-to-image
	- image-editing
	library_name: diffusers
	pipeline_tag: image-to-image
	base_model: runwayml/stable-diffusion-v1-5
	datasets:
	- nishitanand/image-relighting-diffusion-data
	---

	# Learning Illumination Control in Diffusion Models — Model weights (HF)

	Weights for [Learning Illumination Control in Diffusion Models](https://arxiv.org/abs/2604.24877) (ReALM-GEN @ ICLR 2026).

	\| \| \|
	\|--\|--\|
	\| Code \| [github.com/nishitanand/image-relighting-diffusion](https://github.com/nishitanand/image-relighting-diffusion) \|
	\| Dataset \| [huggingface.co/datasets/nishitanand/image-relighting-diffusion-data](https://huggingface.co/datasets/nishitanand/image-relighting-diffusion-data) \|
	\| Paper \| [arxiv.org/abs/2604.24877](https://arxiv.org/abs/2604.24877) \|
	\| Project site \| [nishitanand.github.io/relighting-diffusion-website](https://nishitanand.github.io/relighting-diffusion-website) \|

	- Architecture: `StableDiffusionInstructPix2PixPipeline` (SD 1.5)
	- Inputs: RGB image (512² domain) + instruction string
	- Output: RGB relit image

	---

	## Download weights (CLI)

	```bash
	pip install -U "huggingface_hub[cli]"

	huggingface-cli download nishitanand/sd-image-relighting-model \
	--repo-type model \
	--local-dir ./sd-image-relighting-model
	```

	Or in Python / Diffusers, load directly by repo id (no local dir required).

	---

	## Inference (Python)

	```python
	import torch
	from PIL import Image
	from diffusers import StableDiffusionInstructPix2PixPipeline

	model_id = "nishitanand/sd-image-relighting-model"
	pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(
	model_id,
	torch_dtype=torch.float16,
	).to("cuda")
	pipe.safety_checker = None

	image = Image.open("degraded_input.png").convert("RGB")
	instruction = (
	"Soft window light from the left with warm highlights and gentle shadows "
	"across the face."
	)

	out = pipe(
	instruction,
	image=image,
	num_inference_steps=50,
	image_guidance_scale=1.5,
	guidance_scale=7.5,
	).images[0]

	out.save("relit.png")
	```

	Tips

	- Use prompts in the same long descriptive style as training (see dataset / paper).
	- Input should resemble the synthetic degraded domain the model was trained on for best identity preservation.

	---

	## Inference (project script)

	After cloning the code repository:

	```bash
	cd inference
	pip install -r requirements.txt

	python inference.py \
	--model_path nishitanand/sd-image-relighting-model \
	--input_image /path/to/degraded.png \
	--instruction "Your full-sentence lighting description here." \
	--output_path ./relit.png
	```

	For full training dependencies, you can instead use `pip install -r training/sd1_5/requirements.txt`.

	## Repository layout (this HF model repo)

	At the root of the snapshot you should see a standard Diffusers tree, for example:

	- `model_index.json`
	- `unet/`, `vae/`, `text_encoder/`, `tokenizer/`
	- `scheduler/`, `feature_extractor/`, `safety_checker/`

	Optional: `checkpoint-` (e.g. `checkpoint-13000/`) — raw Accelerate* training snapshot. `from_pretrained` / inference / `evaluate_models.py` load the repository root where `model_index.json` lives — they do not automatically use `checkpoint-`. Point `--model_path` / `--trained_model_path` at `./hf_model`* (or the Hub id), not at `checkpoint-*`, unless you have re-exported that step as a standalone Diffusers layout.

	---

	## Limitations

	- Best on face imagery similar to training; strong out-of-domain poses or resolutions may fail.
	- Not a physically-based renderer; extreme prompts can cause drift.

	---

	## Citation (BibTeX)

	```bibtex
	@article{anand2026learning,
	title={Learning Illumination Control in Diffusion Models},
	author={Anand, Nishit and Suri, Manan and Metzler, Christopher and Manocha, Dinesh and Duraiswami, Ramani},
	journal={arXiv preprint arXiv:2604.24877},
	year={2026},
	note={ReALM-GEN @ ICLR 2026}
	}
	```

	---

	## License

	This model repository is released under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) (see [`LICENSE`](LICENSE) at the repo root when mirrored from GitHub). Base Stable Diffusion components remain subject to their original licenses.

	FFHQ. Weights were trained on data derived from Flickr-Faces-HQ (FFHQ). The FFHQ dataset package is distributed by NVIDIA under CC BY-NC-SA 4.0; individual source images carry their own Flickr licenses. Comply with [NVlabs/ffhq-dataset](https://github.com/NVlabs/ffhq-dataset) and cite A Style-Based Generator Architecture for Generative Adversarial Networks (Karras, Laine, Aila) and FFHQ/NVIDIA as required.

	When you publish results using these weights, cite [arXiv:2604.24877](https://arxiv.org/abs/2604.24877) and link the code, dataset, and this model repository.