--- license: apache-2.0 language: - en base_model: - CompVis/stable-diffusion-v1-4 --- # FG-DM [**Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis**](https://github.com/DeepakSridhar/fgdm)
[Deepak Sridhar](https://deepaksridhar.github.io/), [Abhishek Peri](https://github.com/abhishek-peri), [Rohith Rachala](https://github.com/rohithreddy0087)\, [Nuno Vasconcelos](http://www.svcl.ucsd.edu/~nuno/)
_[NeurIPS '24](https://deepaksridhar.github.io/factorgraphdiffusion.github.io/static/images/FG_DM_NeurIPS_2024_final.pdf) | [GitHub](https://github.com/DeepakSridhar/fgdm) | [arXiv](https://arxiv.org/abs/2410.21638) | [Project page](https://deepaksridhar.github.io/factorgraphdiffusion.github.io)_ ![fg-dm](arch.jpg) ## Cloning Use `--recursive` to also clone the segmentation editor app ``` git clone --recursive https://github.com/DeepakSridhar/fgdm.git ``` ## Requirements A suitable [conda](https://conda.io/) environment named `ldm` can be created and activated with: ``` conda env create -f fgdm.yaml conda activate ldm ``` ### Dataset We used COCO17 dataset for training FG-DMs. 1. You can download the COCO 2017 dataset from the official [COCO Dataset Website](https://cocodataset.org/#download). Download the following components: Annotations: Includes caption and instance annotations. Images: Includes train2017, val2017, and test2017. 2. Extract Files Extract all downloaded files into the /data/coco directory or to your desired location. Place the annotation files in the annotations/ folder. Place the image folders in the images/ folder. 3. Verify the Directory Structure Ensure that your directory structure matches as outlined below. coco/ |---- annotations/ |------- captions_train2017.json |------- captions_val2017.json |------- instances_train2017.json |------- instances_val2017.json |------- train2017/ |------- val2017/ |---- images/ |------- train2017/ |------- val2017/ ## FG-DM Pretrained Weights The segmentation FGDM weights are available on [Google Drive](https://drive.google.com/drive/folders/1eIJxYE3eX5zReosGN1SQdnEDLatZuEp1?usp=sharing) Place them under models directory ## Inference: Text-to-Image with FG-DM ``` bash run_inference.sh ``` ## Training: FG-DM Seg from scratch - We used sdv1.4 weights for training FG-DM conditions but sdv1.5 is also compatible: - The original SD weights are available via [the CompVis organization at Hugging Face](https://huggingface.co/CompVis). The license terms are identical to the original weights. - `sd-v1-4.ckpt`: Resumed from `sd-v1-2.ckpt`. 225k steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598). - Download the condition weights from [ControlNet](https://huggingface.co/lllyasviel/ControlNet/tree/main/annotator/ckpts) and place them in the models folder to train depth and normal FG-DMs. - Alternatively download all these models by running [download_models.sh](scripts/download_models.sh) file under scripts directory. ``` python main.py --base configs/stable-diffusion/nautilus_coco_adapter_semantic_map_gt_captions_distill_loss.yaml -t --gpus 0, ``` ## Acknowledgements Our codebase for the diffusion models builds heavily on [LDM codebase](https://github.com/CompVis/latent-diffusion) and [ControlNet](https://github.com/lllyasviel/ControlNet). Thanks for open-sourcing! ## BibTeX ``` @inproceedings{neuripssridhar24, author = {Sridhar, Deepak and Peri, Abhishek and Rachala, Rohit and Vasconcelos, Nuno}, title = {Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis}, booktitle = {Neural Information Processing Systems}, year = {2024}, } ```