Upload 1392 files

43b7e92 verified over 1 year ago

6 kB

	## [Deprecated] Multi Token Textual Inversion

	IMPORTART: This research project is deprecated. Multi Token Textual Inversion is now supported natively in [the official textual inversion example](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion#running-locally-with-pytorch).

	The author of this project is [Isamu Isozaki](https://github.com/isamu-isozaki) - please make sure to tag the author for issue and PRs as well as @patrickvonplaten.

	We add multi token support to textual inversion. I added
	1. num_vec_per_token for the number of used to reference that token
	2. progressive_tokens for progressively training the token from 1 token to 2 token etc
	3. progressive_tokens_max_steps for the max number of steps until we start full training
	4. vector_shuffle to shuffle vectors

	Feel free to add these options to your training! In practice num_vec_per_token around 10+vector shuffle works great!

	## Textual Inversion fine-tuning example

	[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
	The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.

	## Running on Colab

	Colab for training
	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)

	Colab for inference
	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)

	## Running locally with PyTorch
	### Installing the dependencies

	Before running the scripts, make sure to install the library's training dependencies:

	Important

	To make sure you can successfully run the latest versions of the example scripts, we highly recommend installing from source and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
	```bash
	git clone https://github.com/huggingface/diffusers
	cd diffusers
	pip install .
	```

	Then cd in the example folder and run
	```bash
	pip install -r requirements.txt
	```

	And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:

	```bash
	accelerate config
	```


	### Cat toy example

	You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree.

	You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).

	Run the following command to authenticate your token

	```bash
	huggingface-cli login
	```

	If you have already cloned the repo, then you won't need to go through these steps.

	<br>

	Now let's get our dataset.Download 3-4 images from [here](https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5) and save them in a directory. This will be our training data.

	And launch the training using

	___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___

	```bash
	export MODEL_NAME="runwayml/stable-diffusion-v1-5"
	export DATA_DIR="path-to-dir-containing-images"

	accelerate launch textual_inversion.py \
	--pretrained_model_name_or_path=$MODEL_NAME \
	--train_data_dir=$DATA_DIR \
	--learnable_property="object" \
	--placeholder_token="<cat-toy>" --initializer_token="toy" \
	--resolution=512 \
	--train_batch_size=1 \
	--gradient_accumulation_steps=4 \
	--max_train_steps=3000 \
	--learning_rate=5.0e-04 --scale_lr \
	--lr_scheduler="constant" \
	--lr_warmup_steps=0 \
	--output_dir="textual_inversion_cat"
	```

	A full training run takes ~1 hour on one V100 GPU.

	### Inference

	Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.

	```python
	from diffusers import StableDiffusionPipeline

	model_id = "path-to-your-trained-model"
	pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")

	prompt = "A <cat-toy> backpack"

	image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

	image.save("cat-backpack.png")
	```


	## Training with Flax/JAX

	For faster training on TPUs and GPUs you can leverage the flax training example. Follow the instructions above to get the model and dataset before running the script.

	Before running the scripts, make sure to install the library's training dependencies:

	```bash
	pip install -U -r requirements_flax.txt
	```

	```bash
	export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
	export DATA_DIR="path-to-dir-containing-images"

	python textual_inversion_flax.py \
	--pretrained_model_name_or_path=$MODEL_NAME \
	--train_data_dir=$DATA_DIR \
	--learnable_property="object" \
	--placeholder_token="<cat-toy>" --initializer_token="toy" \
	--resolution=512 \
	--train_batch_size=1 \
	--max_train_steps=3000 \
	--learning_rate=5.0e-04 --scale_lr \
	--output_dir="textual_inversion_cat"
	```
	It should be at least 70% faster than the PyTorch script with the same configuration.

	### Training with xformers:
	You can enable memory efficient attention by [installing xFormers](https://github.com/facebookresearch/xformers#installing-xformers) and padding the `--enable_xformers_memory_efficient_attention` argument to the script. This is not available with the Flax/JAX implementation.