Text-to-Image
Diffusers

Experimental Training on NoobAI Flux2VAE Rectified Flow v0.3 U-REPA to use different CLIP models

Trained on-top of NoobAI-Flux2VAE-RectifiedFlow-0.3 U-REPA

Why?

  • Part of the reason I trained NoobAI Flux2VAE RF with U-REPA in the first place was to see if using U-REPA as an auxiliary loss objective would speed up convergence when making architecture changes (like changing text-encoder).
  • I remembered this reddit post by Anzhc talking about how the text encoders in Noobai are flawed and that they finetuned CLIP-L and Bluvoll finetuned CLIP-G to fix them. I don't remember hearing about anyone doing the required training to use the models, so I decided to.
  • To learn more.

Update:

  • Did a bit more training, files are in ./V2

Main One is NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-Experimental-new-CLIP-V2.safetensors

NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-Experimental-new-CLIP_V2-SMA-Merge.safetensors in ./V2 is a average of the weights between the last 50% of that training run.

Note: Don't use the V1 LoKr on the V2 model.

Relevant Details for Usage

Same as NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA

Use the LoKR NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-Experimental-new-CLIP-Booru-LOKR-000001.safetensors

Make sure to add , to the start and end of the prompt(s), or the first and last word might will get ignored.

If you are getting duplicates at larger resolutions, add solo to the prompt.

Quality Tags

Positive:

masterpiece, best quality, good quality

Due to my own mistakes, these don't have the biggest effect

Negative:

worst quality, low quality, bad anatomy,

  • I also added white glow outline, which might help remove the white "pixel glow" around characters.*

Both artistic error and bad hands can help too

Additional Tagging Information:

  • Artists: Tagged with the prefix by, e.g., by someArtistHere.

    Accidentally tagged some artists with the prefix art by

  • Date Tags:
    • newest
    • new
    • old
    • oldest <--- Does have a noticeable effect
  • Content Rating Tags:
    • Rating: explicit
    • Rating: questionable
    • Rating: sensitive
    • Rating: general
      • Note: You don't have to use them
  • Danbooru Pools & Specific Concepts:
Pool Tagged As
Expert Shading expert shading
Badass badass
Action Shots action
Epic epic
To the death duel to the death
I am King "I am King"
Impending Doom impending doom
Deep Thought thought-provoking
Woman in practical armor (N/A - Implicit)
Nightmare Fuel nightmare fuel
Minorities (N/A - Implicit)
Serious Beauty serious beauty
Handsome Ladies handsome lady
Danbooru Tag Tagged As
Dynamic Pose dynamic pose
Slice of Life slice of life

Training Details

Hardware: 1xA6000 48GB

Vision Encoder (for REPA): dinov3-vitl16-pretrain-lvd1689m

Dataset Details

From deepghs/danbooru2024:

  • 0306-0313.tar.
    • Images from Pools, specific tags, and some from Pixiv.
  • Note: I had to keep restarting the training due to a memory leak. Which I now know (after too much testing) is due to the compute provider. So expect concepts to be undertrained and the white glow to appear

LoKR Dataset

From deepghs/danbooru2024:

  • 0000-0010.tar

Training Stages

1. Phase 1

Froze all layers expect from the key and value projections in the cross-attention layers and un-froze the REPA projector.

2. Phase 2?

  • Base: Phase 1
  • Steps: +? (Unbatched)
  • Model File: NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-Experimental-new-CLIP.safetensors
  • Settings Changed:
    • Un-froze all the layers
    • --learning_rate: 3e-5 (from 1e-4)
    • --repa_lambda: 0.20 (from 0.50)

3. LoKR

  • Base: Phase 2
  • Steps: +~100,000 (Unbatched)
  • Model File: ?
  • Main Changes:
    • No REPA
[Network_setup]
network_dim = 100000
network_alpha = 1
network_dropout = 0
network_train_unet_only = true
resume = false

[LyCORIS]
network_module = "lycoris.kohya"
network_args = [ "preset=full", "algo=lokr", "factor=4", ]


[optimizer_arguments]
lr_scheduler       = "cosine"
optimizer_type     = "AdamW8bit"
optimizer_args     = ["weight_decay=0.01", "eps=1e-8", "betas=0.9,0.999"]
min_lr             = 0

[training_arguments]
unet_lr = 1e-4
text_encoder_lr = 0
max_grad_norm = 1.0
lr_warmup_steps = 30

Run History & Configuration

Initial Configuration (Run 1)

Trained for ? Unbatched steps in this first run.

Key Parameters:

  • --manifold_weight 3.0: Controls the weighting of the manifold loss to the cosine loss in the overall REPA loss (as set in the U-REPA paper).
  • --repa_lambda 0.50: Controls the weighting of the REPA loss to the regular L2 loss. loss = loss + (repa_lambda * total_repa_loss).

Training Code

Support

Support CabalResearch Here, the creators of the NoobAI Flux2VAE RectifiedFlow model and also the CLIP-L and CLIP-G used in this.

Thanks

Downloads last month
57
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheRemixer/NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-New-CLIP

Papers for TheRemixer/NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-New-CLIP