will you consider releasing the training code and (maybe) the dataset?

by levzalt - opened Mar 14

Mar 14

i'm curious to compare it to my implementation of anima distillation in diffusion pipe
i found that manually filtering the generated dataset is a bit exhausting
addionally, what are the sampling settings for the RL set generation?

Einhorn

Owner Mar 14

•

edited Mar 14

Hi Levzalt,

Manual selection was not as difficult as it might seem.
I vibe-coded a custom node for ComfyUI that takes prompts from a file and generates three outputs: the image, the prompt, and the latent.Then, I went through the resulting dataset and removed about 1–2 bad pictures out of every hundred (I used the images only for a fast check). The prompt file was generated by LLM (I gave him an example of some good Anima prompts, mention sites where Anima was trained, and asked him to make the maximum diversification. He did a good job for my taste).
The generation dataset took several hours, but the selection process only took about 10 minutes. Then I ran two training sessions with similar parameters (lr: 0.00002 and lr: 0.00005), then merged the two distilled models to get an 8-step LoRA.
For training, an extension for ai-toolkit was written. But it is very raw, it needs to be improved before publication :)

my learning config:

job: "extension"
config:
name: "anima_turbo_distill_v1"
process:
- type: "anima_distill"
training_folder: "output/anima_distill"
device: "cuda:0"
distill_guidance: 5.0
model:
arch: "anima_preview"
name_or_path: "AI-Toolkit/models/Anima-Preview"
low_vram: false
network:
type: "lora"
linear: 64
linear_alpha: 64
train:
optimizer: "adamw8bit"
lr: 0.00002
batch_size: 4
steps: 6000
gradient_checkpointing: true
gradient_accumulation_steps: 1
save_every: 500
sample_every: 250
dtype: "bf16"
unload_text_encoder: true
train_text_encoder: false
datasets:
- folder_path: "AI-Toolkit/datasets/Anima_HQ"
caption_ext: "txt"
resolution: 1024
cache_latents_to_disk: true
sample:
sample_every: 250
width: 1024
height: 1024
samples:
- prompt: "score_9, score_8, masterpiece, best quality, 1girl, solo, detailed beautiful background, looking at viewer"
- prompt: "score_9, score_8, best quality, 1girl in city, sharp and detailed, illustration"
seed: 42
guidance_scale: 1.0
sample_steps: 8

Einhorn

Owner Mar 14

upd. The dataset won't be made public due to its NSFW context. To be honest, there's not much point anyway - it's just generated using prompts using anima-preview2. The more we filter it, the more we end up with a finetune rather than a distillation. However, I wanted to fix the "bad hands and etc" issue typical of Anima, so I opted for some light manual moderation. I got a slight 3D trending generation in my LoRAs. I think this is due to the abuse of quality tags like score_9 in my prompts.

Einhorn changed discussion status to closed Mar 17

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment