Update README.md

313bba5 verified 11 months ago

11.8 kB

	---
	license: cc-by-nc-sa-4.0
	---
	# ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text

	<div align="center">

	[![arXiv Paper](https://img.shields.io/badge/arXiv-2407.15886%20(base)-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2401.01456)
	[![WACV 2025](https://img.shields.io/badge/WACV%202025-v1-0CA4A5?style=flat&logo=Semantic%20Web)](https://openaccess.thecvf.com/content/WACV2025/html/Yan_ColorizeDiffusion_Improving_Reference-Based_Sketch_Colorization_with_Latent_Diffusion_Model_WACV_2025_paper.html)
	[![arXiv v1.5 Paper](https://img.shields.io/badge/arXiv-2502.19937%20(v1.5)-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2502.19937)
	[![arXiv v2 Paper](https://img.shields.io/badge/arXiv-2504.06895%20(v2)-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2504.06895)
	[![Model Weights](https://img.shields.io/badge/Hugging%20Face-Model%20Weights-FF9D00?style=flat&logo=Hugging%20Face)](https://huggingface.co/tellurion/ColorizeDiffusion/tree/main)
	[![License](https://img.shields.io/badge/License-CC--BY--NC--SA%204.0-4CAF50?style=flat&logo=Creative%20Commons)](https://github.com/tellurion-kanata/colorizeDiffusion/blob/master/LICENSE)

	</div>

	![img](assets/teaser.png)

	(April. 2025)
	Official implementation of Colorize Diffusion.

	Colorize Diffusion is a SD-based colorization framework that can achieve high-quality colorization results with arbitrary input pairs.

	Fundamental issue for this repository: [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456).
	*Version 1* - Base training, 512px. Released, ckpt starts with mult.
	*Version 1.5* - Solving spatial entanglement, 512px. Released, ckpt starts with switch.
	*Version 2* - Enhancing background and style transfer, 768px. Released, ckpt starts with v2.
	*Version XL* - Enhancing embedding guidance for character colorization, geometry disentanglement, 1024px. Available soon.


	## Getting Start

	-------------------------------------------------------------------------------------------
	```shell
	conda env create -f environment.yaml
	conda activate hf
	```

	## User Interface

	-------------------------------------------------------------------------------------------
	We implement a fully-featured UI. To run it, just:
	```shell
	python -u app.py
	```
	The default server address is http://localhost:7860.

	#### Important inference options
	\| Options \| Description \|
	\|:----------------------\|:--------------------------------------------------------------------------------------------------\|
	\| BG enhance \| Low-level feature injection for v2 models. \|
	\| FG enhance \| Useless for currently open-sourced models. \|
	\| Reference strength \| Decreasing it to increase semantic fidelity to sketch inputs. \|
	\| Foreground strength \| Similar to reference strength but only for foreground region. Need to activate FG or BG enhance. \|
	\| Preprocessor \| Sketch preprocessing. Extract is suggested if the sketch input is complicated pencil drawing. \|
	\| Line extractor \| Line extractors used when preprocessor is Extract. \|
	\| Sketch guidance scale \| Classifier-free guidance scale of the sketch image, suggested 1. \|
	\| Attention injection \| Noised low-level feature injection, 2x inference time. \|


	### 768-level Cross-content colorization results (from v2)
	![img](assets/cross-1.png)
	![img](assets/cross-2.png)
	### 1536-level Character colorization results (from XL)
	![img](assets/disentanglement2.png)
	![img](assets/demon.png)

	## Manipulation

	-------------------------------------------------------------------------------------------
	The colorization results can be manipulated using text prompts, see [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456).

	It is now deactivated by default. To activate it, use
	```shell
	python -u app.py -manipulate
	```

	For local manipulations, a visualization is provided to show the correlation between each prompt and tokens in the reference image.


	The manipulation result and correlation visualization of the settings:

	Target prompt: the girl's blonde hair
	Anchor prompt the girl's brown hair
	Control prompt the girl's brown hair,
	Target scale: 8
	Enhanced: false
	Thresholds: 0.5、0.55、0.65、0.95

	![img](assets/preview1.png)
	![img](assets/preview2.png)
	As you can see, the manipluation unavoidably changed some unrelated regions as it is taken on the reference embeddings.

	#### Manipulation options
	\| Options \| Description \|
	\| :----- \|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| Group index \| The index of selected manipulation sequences's parameter group. \|
	\| Target prompt \| The prompt used to specify the desired visual attribute for the image after manipulation. \|
	\| Anchor prompt \| The prompt to specify the anchored visaul attribute for the image before manipulation. \|
	\| Control prompt \| Used for local manipulation (crossattn-based models). The prompt to specify the target regions. \|
	\| Enhance \| Specify whether this manipulation should be enhanced or not. (More likely to influence unrelated attribute). \|
	\| Target scale \| The scale used to progressively control the manipulation. \|
	\| Thresholds \| Used for local manipulation (crossattn-based models). Four hyperparameters used to reduce the influnece on irrelevant visual attributes, where 0.0 < threshold 0 < threshold 1 < threshold 2 < threshold 3 < 1.0. \|
	\| \<Threshold0 \| Select regions most related to control prompt. Indicated by deep blue. \|
	\| Threshold0-Threshold1 \| Select regions related to control prompt. Indicated by blue. \|
	\| Threshold1-Threshold2 \| Select neighbouring but unrelated regions. Indicated by green. \|
	\| Threshold2-Threshold3 \| Select unrelated regions. Indicated by orange. \|
	\| \>Threshold3 \| Select most unrelated regions. Indicated by brown. \|
	\|Add\| Click add to save current manipulation in the sequence. \|


	## Training
	Our implementation is based on Accelerate and Deepspeed.
	Before starting a training, first collect data and organize your training dataset as follows:

	```
	[dataset_path]
	├── image_list.json # Optionally for image indexing
	├── color/ # Color images
	│ ├── 0001.zip
	\| \| ├── 10001.png
	\| \| ├── 100001.jpg
	│ \| └── ...
	│ ├── 0002.zip
	│ └── ...
	├── sketch # Sketch images
	│ ├── 0001.zip
	\| \| ├── 10001.png
	\| \| ├── 100001.jpg
	│ \| └── ...
	│ ├── 0002.zip
	│ └── ...
	└── mask # Mask images (required for fg-bg training)
	├── 0001.zip
	\| ├── 10001.png
	\| ├── 100001.jpg
	\| └── ...
	├── 0002.zip
	└── ...
	```
	For details of dataset organization, check `data/dataloader.py`.
	Training command example:
	```
	accelerate launch --config_file [accelerate_config_file] \
	train.py \
	--name base \
	--dataroot [dataset_path] \
	--batch_size 64 \
	--num_threads 8 \
	-cfg configs/train/sd2.1/mult.yaml \
	-pt [pretrained_model_path]
	```
	Refer to `options.py` for training/inference/validation arguments.
	Note that the `batch size` here is micro batch size per gpu. If you run the command on 8 gpus, the total batch size is 512.


	## Code reference
	1. [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion)
	2. [Stable Diffusion XL](https://github.com/Stability-AI/generative-models)
	3. [SD-webui-ControlNet](https://github.com/Mikubill/sd-webui-controlnet)
	4. [Stable-Diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
	5. [K-diffusion](https://github.com/crowsonkb/k-diffusion)
	6. [Deepspeed](https://github.com/microsoft/DeepSpeed)
	7. [sketchKeras-PyTorch](https://github.com/higumax/sketchKeras-pytorch)

	## Citation
	```
	@article{2024arXiv240101456Y,
	author = {{Yan}, Dingkun and {Yuan}, Liang and {Wu}, Erwin and {Nishioka}, Yuma and {Fujishiro}, Issei and {Saito}, Suguru},
	title = "{ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text}",
	journal = {arXiv e-prints},
	year = {2024},
	doi = {10.48550/arXiv.2401.01456},
	}

	@InProceedings{Yan_2025_WACV,
	author = {Yan, Dingkun and Yuan, Liang and Wu, Erwin and Nishioka, Yuma and Fujishiro, Issei and Saito, Suguru},
	title = {ColorizeDiffusion: Improving Reference-Based Sketch Colorization with Latent Diffusion Model},
	booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
	year = {2025},
	pages = {5092-5102}
	}

	@article{2025arXiv250219937Y,
	author = {{Yan}, Dingkun and {Wang}, Xinrui and {Li}, Zhuoru and {Saito}, Suguru and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Guo}, Jiaxian},
	title = "{Image Referenced Sketch Colorization Based on Animation Creation Workflow}",
	journal = {arXiv e-prints},
	year = {2025},
	doi = {10.48550/arXiv.2502.19937},
	}

	@article{yan2025colorizediffusionv2enhancingreferencebased,
	title={ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities},
	author={Dingkun Yan and Xinrui Wang and Yusuke Iwasawa and Yutaka Matsuo and Suguru Saito and Jiaxian Guo},
	year={2025},
	journal = {arXiv e-prints},
	doi = {10.48550/arXiv.2504.06895},
	}