Full Finetuning Questions
#118
by motimalu - opened
Hello, been enjoying finetuning your Anima preview models for a while now and have accumulated a bit of a bucket list of questions that I hope you will humor me with...
- What is a sane scaling rule to use for LR when increasing global batch size for this architecture?
- i.e. taking the suggested
rank 32 LoRA, start with 2e-5, how should the LR scale at global batch size of 16, 32, 64... 128 etc? - Likewise for the LLM Adaptor, how should its LR be scaled with the global batch size?
- i.e. taking the suggested
- How do you achieve training on a combination of Danbooru-style tags, natural language captions, and combinations of tags and captions?
- Assuming with diffusion-pipe this is somehow done using the
captions.jsondata format, is it something like:
- Assuming with diffusion-pipe this is somehow done using the
{
"image_01.png": [
"[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]",
"[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags] [captions]",
"[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [captions]",
"[captions] [quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist]",
"[captions] [quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]",
]
}
- About tag order being arbitrary, if the individual groups are also shuffled during training, how is that achieved with diffusion-pipe?
[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]
Within each tag section, the tags can be in arbitrary order.
- For the planned 1536 resolution training, will this be 1536x1536 i.e. 2,359,296 total pixels training?
- Will any intermediate resolutions be used, or will the model jump from 1024x1024 to 1536x1536 directly?
- Likewise for the pre-training steps, would you mind sharing the resolutions between 512x512 and 1024x1024 used?
- For finetuning ~40,000 images from the Preview 3 base, which resolutions training regime would you recommend:
a. 512x512 -> 1024x512 -> 1024x1024 -> 1536x1024 -> 1536x1536
b. 512x512 -> 1024x1024 -> 1536x1536
c. 1024x1024 -> 1536x1024 -> 1536x1536
d. 1024x1024 -> 1536x1536e. Everything everywhere all at once with multi-res training and per-resolution batch sizes
γ € β§οΌΏβ§
γ(γο½₯βο½₯)
γ(γγ€β³β
Ξ΅ (_)γΈβγ½οΎ
(γγ(γο½₯Οο½₯)
βββ β β
The model seems to respond well to higher resolution training, so I am looking forward to seeing how that goes!