Diffusion Single File
comfyui

Full Finetuning Questions

#118
by motimalu - opened

Hello, been enjoying finetuning your Anima preview models for a while now and have accumulated a bit of a bucket list of questions that I hope you will humor me with...

  1. What is a sane scaling rule to use for LR when increasing global batch size for this architecture?
    • i.e. taking the suggested rank 32 LoRA, start with 2e-5, how should the LR scale at global batch size of 16, 32, 64... 128 etc?
    • Likewise for the LLM Adaptor, how should its LR be scaled with the global batch size?
  2. How do you achieve training on a combination of Danbooru-style tags, natural language captions, and combinations of tags and captions?
    • Assuming with diffusion-pipe this is somehow done using the captions.json data format, is it something like:
{
  "image_01.png": [
    "[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]",
    "[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags] [captions]",
    "[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [captions]",
    "[captions] [quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist]",
    "[captions] [quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]",
  ]
}
  1. About tag order being arbitrary, if the individual groups are also shuffled during training, how is that achieved with diffusion-pipe?
[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]
Within each tag section, the tags can be in arbitrary order.
  1. For the planned 1536 resolution training, will this be 1536x1536 i.e. 2,359,296 total pixels training?
  2. Will any intermediate resolutions be used, or will the model jump from 1024x1024 to 1536x1536 directly?
  3. Likewise for the pre-training steps, would you mind sharing the resolutions between 512x512 and 1024x1024 used?
  4. For finetuning ~40,000 images from the Preview 3 base, which resolutions training regime would you recommend:
    a. 512x512 -> 1024x512 -> 1024x1024 -> 1536x1024 -> 1536x1536
    b. 512x512 -> 1024x1024 -> 1536x1536
    c. 1024x1024 -> 1536x1024 -> 1536x1536
    d. 1024x1024 -> 1536x1536
    e. Everything everywhere all at once with multi-res training and per-resolution batch sizes γ…€ ∧_∧
    γ€€(γ€€ο½₯βˆ€ο½₯)
    γ€€(γ€€γ€β”³βŠƒ
    Ξ΅ (_)γΈβŒ’γƒ½οΎŒ
    (γ€€γ€€(γ€€ο½₯Ο‰ο½₯)
    β—Žβ€•β—Ž βŠƒ βŠƒ

The model seems to respond well to higher resolution training, so I am looking forward to seeing how that goes!

Sign up or log in to comment