| <!--Copyright 2024 Custom Diffusion authors The HuggingFace Team. All rights reserved. |
|
|
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
|
|
| http://www.apache.org/licenses/LICENSE-2.0 |
|
|
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations under the License. |
| --> |
|
|
| # ์ปค์คํ
Diffusion ํ์ต ์์ |
|
|
| [์ปค์คํ
Diffusion](https://arxiv.org/abs/2212.04488)์ ํผ์ฌ์ฒด์ ์ด๋ฏธ์ง ๋ช ์ฅ(4~5์ฅ)๋ง ์ฃผ์ด์ง๋ฉด Stable Diffusion์ฒ๋ผ text-to-image ๋ชจ๋ธ์ ์ปค์คํฐ๋ง์ด์งํ๋ ๋ฐฉ๋ฒ์
๋๋ค. |
| 'train_custom_diffusion.py' ์คํฌ๋ฆฝํธ๋ ํ์ต ๊ณผ์ ์ ๊ตฌํํ๊ณ ์ด๋ฅผ Stable Diffusion์ ๋ง๊ฒ ์กฐ์ ํ๋ ๋ฐฉ๋ฒ์ ๋ณด์ฌ์ค๋๋ค. |
|
|
| ์ด ๊ต์ก ์ฌ๋ก๋ [Nupur Kumari](https://nupurkmr9.github.io/)๊ฐ ์ ๊ณตํ์์ต๋๋ค. (Custom Diffusion์ ์ ์ ์ค ํ๋ช
). |
|
|
| ## ๋ก์ปฌ์์ PyTorch๋ก ์คํํ๊ธฐ |
|
|
| ### Dependencies ์ค์นํ๊ธฐ |
|
|
| ์คํฌ๋ฆฝํธ๋ฅผ ์คํํ๊ธฐ ์ ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ํ์ต dependencies๋ฅผ ์ค์นํด์ผ ํฉ๋๋ค: |
|
|
| **์ค์** |
|
|
| ์์ ์คํฌ๋ฆฝํธ์ ์ต์ ๋ฒ์ ์ ์ฑ๊ณต์ ์ผ๋ก ์คํํ๋ ค๋ฉด **์์ค๋ก๋ถํฐ ์ค์น**ํ๋ ๊ฒ์ ๋งค์ฐ ๊ถ์ฅํ๋ฉฐ, ์์ ์คํฌ๋ฆฝํธ๋ฅผ ์์ฃผ ์
๋ฐ์ดํธํ๋ ๋งํผ ์ผ๋ถ ์์ ๋ณ ์๊ตฌ ์ฌํญ์ ์ค์นํ๊ณ ์ค์น๋ฅผ ์ต์ ์ํ๋ก ์ ์งํ๋ ๊ฒ์ด ์ข์ต๋๋ค. ์ด๋ฅผ ์ํด ์ ๊ฐ์ ํ๊ฒฝ์์ ๋ค์ ๋จ๊ณ๋ฅผ ์คํํ์ธ์: |
|
|
|
|
| ```bash |
| git clone https://github.com/huggingface/diffusers |
| cd diffusers |
| pip install -e . |
| ``` |
|
|
| [example folder](https://github.com/huggingface/diffusers/tree/main/examples/custom_diffusion)๋ก cdํ์ฌ ์ด๋ํ์ธ์. |
|
|
| ``` |
| cd examples/custom_diffusion |
| ``` |
|
|
| ์ด์ ์คํ |
|
|
| ```bash |
| pip install -r requirements.txt |
| pip install clip-retrieval |
| ``` |
|
|
| ๊ทธ๋ฆฌ๊ณ [๐คAccelerate](https://github.com/huggingface/accelerate/) ํ๊ฒฝ์ ์ด๊ธฐํ: |
|
|
| ```bash |
| accelerate config |
| ``` |
|
|
| ๋๋ ์ฌ์ฉ์ ํ๊ฒฝ์ ๋ํ ์ง๋ฌธ์ ๋ตํ์ง ์๊ณ ๊ธฐ๋ณธ ๊ฐ์ ๊ตฌ์ฑ์ ์ฌ์ฉํ๋ ค๋ฉด ๋ค์๊ณผ ๊ฐ์ด ํ์ธ์. |
|
|
| ```bash |
| accelerate config default |
| ``` |
|
|
| ๋๋ ์ฌ์ฉ ์ค์ธ ํ๊ฒฝ์ด ๋ํํ ์
ธ์ ์ง์ํ์ง ์๋ ๊ฒฝ์ฐ(์: jupyter notebook) |
|
|
| ```python |
| from accelerate.utils import write_basic_config |
| |
| write_basic_config() |
| ``` |
| ### ๊ณ ์์ด ์์ ๐บ |
|
|
| ์ด์ ๋ฐ์ดํฐ์
์ ๊ฐ์ ธ์ต๋๋ค. [์ฌ๊ธฐ](https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip)์์ ๋ฐ์ดํฐ์
์ ๋ค์ด๋ก๋ํ๊ณ ์์ถ์ ํ๋๋ค. ์ง์ ๋ฐ์ดํฐ์
์ ์ฌ์ฉํ๋ ค๋ฉด [ํ์ต์ฉ ๋ฐ์ดํฐ์
์์ฑํ๊ธฐ](create_dataset) ๊ฐ์ด๋๋ฅผ ์ฐธ๊ณ ํ์ธ์. |
|
|
| ๋ํ 'clip-retrieval'์ ์ฌ์ฉํ์ฌ 200๊ฐ์ ์ค์ ์ด๋ฏธ์ง๋ฅผ ์์งํ๊ณ , regularization์ผ๋ก์ ์ด๋ฅผ ํ์ต ๋ฐ์ดํฐ์
์ ํ๊ฒ ์ด๋ฏธ์ง์ ๊ฒฐํฉํฉ๋๋ค. ์ด๋ ๊ฒ ํ๋ฉด ์ฃผ์ด์ง ํ๊ฒ ์ด๋ฏธ์ง์ ๋ํ ๊ณผ์ ํฉ์ ๋ฐฉ์งํ ์ ์์ต๋๋ค. ๋ค์ ํ๋๊ทธ๋ฅผ ์ฌ์ฉํ๋ฉด `prior_loss_weight=1.`๋ก `prior_preservation`, `real_prior` regularization์ ํ์ฑํํ ์ ์์ต๋๋ค. |
| ํด๋์ค_ํ๋กฌํํธ`๋ ๋์ ์ด๋ฏธ์ง์ ๋์ผํ ์นดํ
๊ณ ๋ฆฌ ์ด๋ฆ์ด์ด์ผ ํฉ๋๋ค. ์์ง๋ ์ค์ ์ด๋ฏธ์ง์๋ `class_prompt`์ ์ ์ฌํ ํ
์คํธ ์บก์
์ด ์์ต๋๋ค. ๊ฒ์๋ ์ด๋ฏธ์ง๋ `class_data_dir`์ ์ ์ฅ๋ฉ๋๋ค. ์์ฑ๋ ์ด๋ฏธ์ง๋ฅผ regularization์ผ๋ก ์ฌ์ฉํ๊ธฐ ์ํด `real_prior`๋ฅผ ๋นํ์ฑํํ ์ ์์ต๋๋ค. ์ค์ ์ด๋ฏธ์ง๋ฅผ ์์งํ๋ ค๋ฉด ํ๋ จ ์ ์ ์ด ๋ช
๋ น์ ๋จผ์ ์ฌ์ฉํ์ญ์์ค. |
| |
| ```bash |
| pip install clip-retrieval |
| python retrieve.py --class_prompt cat --class_data_dir real_reg/samples_cat --num_class_images 200 |
| ``` |
| |
| **___์ฐธ๊ณ : [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 ๋ชจ๋ธ์ ์ฌ์ฉํ๋ ๊ฒฝ์ฐ 'ํด์๋'๋ฅผ 768๋ก ๋ณ๊ฒฝํ์ธ์.___** |
| |
| ์คํฌ๋ฆฝํธ๋ ๋ชจ๋ธ ์ฒดํฌํฌ์ธํธ์ `pytorch_custom_diffusion_weights.bin` ํ์ผ์ ์์ฑํ์ฌ ์ ์ฅ์์ ์ ์ฅํฉ๋๋ค. |
| |
| ```bash |
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
| export OUTPUT_DIR="path-to-save-model" |
| export INSTANCE_DIR="./data/cat" |
| |
| accelerate launch train_custom_diffusion.py \ |
| --pretrained_model_name_or_path=$MODEL_NAME \ |
| --instance_data_dir=$INSTANCE_DIR \ |
| --output_dir=$OUTPUT_DIR \ |
| --class_data_dir=./real_reg/samples_cat/ \ |
| --with_prior_preservation --real_prior --prior_loss_weight=1.0 \ |
| --class_prompt="cat" --num_class_images=200 \ |
| --instance_prompt="photo of a <new1> cat" \ |
| --resolution=512 \ |
| --train_batch_size=2 \ |
| --learning_rate=1e-5 \ |
| --lr_warmup_steps=0 \ |
| --max_train_steps=250 \ |
| --scale_lr --hflip \ |
| --modifier_token "<new1>" \ |
| --push_to_hub |
| ``` |
| |
| **๋ ๋ฎ์ VRAM ์๊ตฌ ์ฌํญ(GPU๋น 16GB)์ผ๋ก ๋ ๋น ๋ฅด๊ฒ ํ๋ จํ๋ ค๋ฉด `--enable_xformers_memory_efficient_attention`์ ์ฌ์ฉํ์ธ์. ์ค์น ๋ฐฉ๋ฒ์ [๊ฐ์ด๋](https://github.com/facebookresearch/xformers)๋ฅผ ๋ฐ๋ฅด์ธ์.** |
| |
| ๊ฐ์ค์น ๋ฐ ํธํฅ(`wandb`)์ ์ฌ์ฉํ์ฌ ์คํ์ ์ถ์ ํ๊ณ ์ค๊ฐ ๊ฒฐ๊ณผ๋ฅผ ์ ์ฅํ๋ ค๋ฉด(๊ฐ๋ ฅํ ๊ถ์ฅํฉ๋๋ค) ๋ค์ ๋จ๊ณ๋ฅผ ๋ฐ๋ฅด์ธ์: |
| |
| * `wandb` ์ค์น: `pip install wandb`. |
| * ๋ก๊ทธ์ธ : `wandb login`. |
| * ๊ทธ๋ฐ ๋ค์ ํธ๋ ์ด๋์ ์์ํ๋ ๋์ `validation_prompt`๋ฅผ ์ง์ ํ๊ณ `report_to`๋ฅผ `wandb`๋ก ์ค์ ํฉ๋๋ค. ๋ค์๊ณผ ๊ฐ์ ๊ด๋ จ ์ธ์๋ฅผ ๊ตฌ์ฑํ ์๋ ์์ต๋๋ค: |
| * `num_validation_images` |
| * `validation_steps` |
|
|
| ```bash |
| accelerate launch train_custom_diffusion.py \ |
| --pretrained_model_name_or_path=$MODEL_NAME \ |
| --instance_data_dir=$INSTANCE_DIR \ |
| --output_dir=$OUTPUT_DIR \ |
| --class_data_dir=./real_reg/samples_cat/ \ |
| --with_prior_preservation --real_prior --prior_loss_weight=1.0 \ |
| --class_prompt="cat" --num_class_images=200 \ |
| --instance_prompt="photo of a <new1> cat" \ |
| --resolution=512 \ |
| --train_batch_size=2 \ |
| --learning_rate=1e-5 \ |
| --lr_warmup_steps=0 \ |
| --max_train_steps=250 \ |
| --scale_lr --hflip \ |
| --modifier_token "<new1>" \ |
| --validation_prompt="<new1> cat sitting in a bucket" \ |
| --report_to="wandb" \ |
| --push_to_hub |
| ``` |
|
|
| ๋ค์์ [Weights and Biases page](https://wandb.ai/sayakpaul/custom-diffusion/runs/26ghrcau)์ ์์์ด๋ฉฐ, ์ฌ๋ฌ ํ์ต ์ธ๋ถ ์ ๋ณด์ ํจ๊ป ์ค๊ฐ ๊ฒฐ๊ณผ๋ค์ ํ์ธํ ์ ์์ต๋๋ค. |
|
|
| `--push_to_hub`๋ฅผ ์ง์ ํ๋ฉด ํ์ต๋ ํ๋ผ๋ฏธํฐ๊ฐ ํ๊น
ํ์ด์ค ํ๋ธ์ ๋ฆฌํฌ์งํ ๋ฆฌ์ ํธ์๋ฉ๋๋ค. ๋ค์์ [์์ ๋ฆฌํฌ์งํ ๋ฆฌ](https://huggingface.co/sayakpaul/custom-diffusion-cat)์
๋๋ค. |
|
|
| ### ๋ฉํฐ ์ปจ์
์ ๋ํ ํ์ต ๐ฑ๐ชต |
|
|
| [this](https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)์ ์ ์ฌํ๊ฒ ๊ฐ ์ปจ์
์ ๋ํ ์ ๋ณด๊ฐ ํฌํจ๋ [json](https://github.com/adobe-research/custom-diffusion/blob/main/assets/concept_list.json) ํ์ผ์ ์ ๊ณตํฉ๋๋ค. |
|
|
| ์ค์ ์ด๋ฏธ์ง๋ฅผ ์์งํ๋ ค๋ฉด json ํ์ผ์ ๊ฐ ์ปจ์
์ ๋ํด ์ด ๋ช
๋ น์ ์คํํฉ๋๋ค. |
|
|
| ```bash |
| pip install clip-retrieval |
| python retrieve.py --class_prompt {} --class_data_dir {} --num_class_images 200 |
| ``` |
|
|
| ๊ทธ๋ผ ์ฐ๋ฆฌ๋ ํ์ต์ํฌ ์ค๋น๊ฐ ๋์์ต๋๋ค! |
|
|
| ```bash |
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
| export OUTPUT_DIR="path-to-save-model" |
| |
| accelerate launch train_custom_diffusion.py \ |
| --pretrained_model_name_or_path=$MODEL_NAME \ |
| --output_dir=$OUTPUT_DIR \ |
| --concepts_list=./concept_list.json \ |
| --with_prior_preservation --real_prior --prior_loss_weight=1.0 \ |
| --resolution=512 \ |
| --train_batch_size=2 \ |
| --learning_rate=1e-5 \ |
| --lr_warmup_steps=0 \ |
| --max_train_steps=500 \ |
| --num_class_images=200 \ |
| --scale_lr --hflip \ |
| --modifier_token "<new1>+<new2>" \ |
| --push_to_hub |
| ``` |
|
|
| ๋ค์์ [Weights and Biases page](https://wandb.ai/sayakpaul/custom-diffusion/runs/3990tzkg)์ ์์์ด๋ฉฐ, ๋ค๋ฅธ ํ์ต ์ธ๋ถ ์ ๋ณด์ ํจ๊ป ์ค๊ฐ ๊ฒฐ๊ณผ๋ค์ ํ์ธํ ์ ์์ต๋๋ค. |
|
|
| ### ์ฌ๋ ์ผ๊ตด์ ๋ํ ํ์ต |
|
|
| ์ฌ๋ ์ผ๊ตด์ ๋ํ ํ์ธํ๋์ ์ํด ๋ค์๊ณผ ๊ฐ์ ์ค์ ์ด ๋ ํจ๊ณผ์ ์ด๋ผ๋ ๊ฒ์ ํ์ธํ์ต๋๋ค: `learning_rate=5e-6`, `max_train_steps=1000 to 2000`, `freeze_model=crossattn`์ ์ต์ 15~20๊ฐ์ ์ด๋ฏธ์ง๋ก ์ค์ ํฉ๋๋ค. |
|
|
| ์ค์ ์ด๋ฏธ์ง๋ฅผ ์์งํ๋ ค๋ฉด ํ๋ จ ์ ์ ์ด ๋ช
๋ น์ ๋จผ์ ์ฌ์ฉํ์ญ์์ค. |
|
|
| ```bash |
| pip install clip-retrieval |
| python retrieve.py --class_prompt person --class_data_dir real_reg/samples_person --num_class_images 200 |
| ``` |
|
|
| ์ด์ ํ์ต์ ์์ํ์ธ์! |
|
|
| ```bash |
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" |
| export OUTPUT_DIR="path-to-save-model" |
| export INSTANCE_DIR="path-to-images" |
| |
| accelerate launch train_custom_diffusion.py \ |
| --pretrained_model_name_or_path=$MODEL_NAME \ |
| --instance_data_dir=$INSTANCE_DIR \ |
| --output_dir=$OUTPUT_DIR \ |
| --class_data_dir=./real_reg/samples_person/ \ |
| --with_prior_preservation --real_prior --prior_loss_weight=1.0 \ |
| --class_prompt="person" --num_class_images=200 \ |
| --instance_prompt="photo of a <new1> person" \ |
| --resolution=512 \ |
| --train_batch_size=2 \ |
| --learning_rate=5e-6 \ |
| --lr_warmup_steps=0 \ |
| --max_train_steps=1000 \ |
| --scale_lr --hflip --noaug \ |
| --freeze_model crossattn \ |
| --modifier_token "<new1>" \ |
| --enable_xformers_memory_efficient_attention \ |
| --push_to_hub |
| ``` |
|
|
| ## ์ถ๋ก |
|
|
| ์ ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ํ์ต์ํจ ํ์๋ ์๋ ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํ์ฌ ์ถ๋ก ์ ์คํํ ์ ์์ต๋๋ค. ํ๋กฌํํธ์ 'modifier token'(์: ์ ์์ ์์๋ \<new1\>)์ ๋ฐ๋์ ํฌํจํด์ผ ํฉ๋๋ค. |
|
|
| ```python |
| import torch |
| from diffusers import DiffusionPipeline |
| |
| pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16).to("cuda") |
| pipe.unet.load_attn_procs("path-to-save-model", weight_name="pytorch_custom_diffusion_weights.bin") |
| pipe.load_textual_inversion("path-to-save-model", weight_name="<new1>.bin") |
| |
| image = pipe( |
| "<new1> cat sitting in a bucket", |
| num_inference_steps=100, |
| guidance_scale=6.0, |
| eta=1.0, |
| ).images[0] |
| image.save("cat.png") |
| ``` |
|
|
| ํ๋ธ ๋ฆฌํฌ์งํ ๋ฆฌ์์ ์ด๋ฌํ ๋งค๊ฐ๋ณ์๋ฅผ ์ง์ ๋ก๋ํ ์ ์์ต๋๋ค: |
|
|
| ```python |
| import torch |
| from huggingface_hub.repocard import RepoCard |
| from diffusers import DiffusionPipeline |
| |
| model_id = "sayakpaul/custom-diffusion-cat" |
| card = RepoCard.load(model_id) |
| base_model_id = card.data.to_dict()["base_model"] |
| |
| pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16).to("cuda") |
| pipe.unet.load_attn_procs(model_id, weight_name="pytorch_custom_diffusion_weights.bin") |
| pipe.load_textual_inversion(model_id, weight_name="<new1>.bin") |
| |
| image = pipe( |
| "<new1> cat sitting in a bucket", |
| num_inference_steps=100, |
| guidance_scale=6.0, |
| eta=1.0, |
| ).images[0] |
| image.save("cat.png") |
| ``` |
|
|
| ๋ค์์ ์ฌ๋ฌ ์ปจ์
์ผ๋ก ์ถ๋ก ์ ์ํํ๋ ์์ ์
๋๋ค: |
|
|
| ```python |
| import torch |
| from huggingface_hub.repocard import RepoCard |
| from diffusers import DiffusionPipeline |
| |
| model_id = "sayakpaul/custom-diffusion-cat-wooden-pot" |
| card = RepoCard.load(model_id) |
| base_model_id = card.data.to_dict()["base_model"] |
| |
| pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16).to("cuda") |
| pipe.unet.load_attn_procs(model_id, weight_name="pytorch_custom_diffusion_weights.bin") |
| pipe.load_textual_inversion(model_id, weight_name="<new1>.bin") |
| pipe.load_textual_inversion(model_id, weight_name="<new2>.bin") |
| |
| image = pipe( |
| "the <new1> cat sculpture in the style of a <new2> wooden pot", |
| num_inference_steps=100, |
| guidance_scale=6.0, |
| eta=1.0, |
| ).images[0] |
| image.save("multi-subject.png") |
| ``` |
|
|
| ์ฌ๊ธฐ์ '๊ณ ์์ด'์ '๋๋ฌด ๋๋น'๋ ์ฌ๋ฌ ์ปจ์
์ ๋งํฉ๋๋ค. |
|
|
| ### ํ์ต๋ ์ฒดํฌํฌ์ธํธ์์ ์ถ๋ก ํ๊ธฐ |
|
|
| `--checkpointing_steps` ์ธ์๋ฅผ ์ฌ์ฉํ ๊ฒฝ์ฐ ํ์ต ๊ณผ์ ์์ ์ ์ฅ๋ ์ ์ฒด ์ฒดํฌํฌ์ธํธ ์ค ํ๋์์ ์ถ๋ก ์ ์ํํ ์๋ ์์ต๋๋ค. |
|
|
| ## Grads๋ฅผ None์ผ๋ก ์ค์ |
|
|
| ๋ ๋ง์ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ ์ฝํ๋ ค๋ฉด ์คํฌ๋ฆฝํธ์ `--set_grads_to_none` ์ธ์๋ฅผ ์ ๋ฌํ์ธ์. ์ด๋ ๊ฒ ํ๋ฉด ์ฑ์ ์ด 0์ด ์๋ ์์์ผ๋ก ์ค์ ๋ฉ๋๋ค. ๊ทธ๋ฌ๋ ํน์ ๋์์ด ๋ณ๊ฒฝ๋๋ฏ๋ก ๋ฌธ์ ๊ฐ ๋ฐ์ํ๋ฉด ์ด ์ธ์๋ฅผ ์ ๊ฑฐํ์ธ์. |
|
|
| ์์ธํ ์ ๋ณด: https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html |
| |
| ## ์คํ ๊ฒฐ๊ณผ |
| |
| ์คํ์ ๋ํ ์์ธํ ๋ด์ฉ์ [๋น์ฌ ์นํ์ด์ง](https://www.cs.cmu.edu/~custom-diffusion/)๋ฅผ ์ฐธ์กฐํ์ธ์. |