Experimental Training on NoobAI Flux2VAE Rectified Flow v0.3 U-REPA to use different CLIP models

Trained on-top of NoobAI-Flux2VAE-RectifiedFlow-0.3 U-REPA

Why?

Part of the reason I trained NoobAI Flux2VAE RF with U-REPA in the first place was to see if using U-REPA as an auxiliary loss objective would speed up convergence when making architecture changes (like changing text-encoder).
I remembered this reddit post by Anzhc talking about how the text encoders in Noobai are flawed and that they finetuned CLIP-L and Bluvoll finetuned CLIP-G to fix them. I don't remember hearing about anyone doing the required training to use the models, so I decided to.
To learn more.

Update:

Did a bit more training, files are in ./V2

Main One is NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-Experimental-new-CLIP-V2.safetensors

NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-Experimental-new-CLIP_V2-SMA-Merge.safetensors in ./V2 is a average of the weights between the last 50% of that training run.

Note: Don't use the V1 LoKr on the V2 model.

Relevant Details for Usage

Same as NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA

Use the LoKR NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-Experimental-new-CLIP-Booru-LOKR-000001.safetensors

Make sure to add , to the start and end of the prompt(s), or the first and last word ~~might~~ will get ignored.

If you are getting duplicates at larger resolutions, add solo to the prompt.

Quality Tags

Positive:

masterpiece, best quality, good quality

Due to my own mistakes, these don't have the biggest effect

Negative:

worst quality, low quality, bad anatomy,

I also added white glow outline, which might help remove the white "pixel glow" around characters.*

Both artistic error and bad hands can help too

Additional Tagging Information:

Artists: Tagged with the prefix by, e.g., by someArtistHere.

Accidentally tagged some artists with the prefix art by
Date Tags:
- newest
- new
- old
- oldest <--- Does have a noticeable effect
Content Rating Tags:
- Rating: explicit
- Rating: questionable
- Rating: sensitive
- Rating: general
  - Note: You don't have to use them
Danbooru Pools & Specific Concepts:

Pool	Tagged As
Expert Shading	`expert shading`
Badass	`badass`
Action Shots	`action`
Epic	`epic`
To the death	`duel to the death`
I am King	`"I am King"`
Impending Doom	`impending doom`
Deep Thought	`thought-provoking`
Woman in practical armor	(N/A - Implicit)
Nightmare Fuel	`nightmare fuel`
Minorities	(N/A - Implicit)
Serious Beauty	`serious beauty`
Handsome Ladies	`handsome lady`

Danbooru Tag	Tagged As
Dynamic Pose	`dynamic pose`
Slice of Life	`slice of life`

Training Details

Hardware: 1xA6000 48GB

Vision Encoder (for REPA): dinov3-vitl16-pretrain-lvd1689m

Dataset Details

From deepghs/danbooru2024:

0306-0313.tar.
- Images from Pools, specific tags, and some from Pixiv.
Note: I had to keep restarting the training due to a memory leak. Which I now know (after too much testing) is due to the compute provider. So expect concepts to be undertrained and the white glow to appear

LoKR Dataset

From deepghs/danbooru2024:

0000-0010.tar

Training Stages

1. Phase 1

Base: NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-Base
Total Steps: ??? (Unbatched)
Model File: NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-Base-newClip-????.safetensors
Learning Rate: 1e-4

Froze all layers expect from the key and value projections in the cross-attention layers and un-froze the REPA projector.

2. Phase 2?

Base: Phase 1
Steps: +? (Unbatched)
Model File: NoobAI-Flux2VAE-RectifiedFlow-0.3-U-REPA-Experimental-new-CLIP.safetensors
Settings Changed:
- Un-froze all the layers
- --learning_rate: 3e-5 (from 1e-4)
- --repa_lambda: 0.20 (from 0.50)

3. LoKR

Base: Phase 2
Steps: +~100,000 (Unbatched)
Model File: ?
Main Changes:
- No REPA

[Network_setup]
network_dim = 100000
network_alpha = 1
network_dropout = 0
network_train_unet_only = true
resume = false

[LyCORIS]
network_module = "lycoris.kohya"
network_args = [ "preset=full", "algo=lokr", "factor=4", ]


[optimizer_arguments]
lr_scheduler       = "cosine"
optimizer_type     = "AdamW8bit"
optimizer_args     = ["weight_decay=0.01", "eps=1e-8", "betas=0.9,0.999"]
min_lr             = 0

[training_arguments]
unet_lr = 1e-4
text_encoder_lr = 0
max_grad_norm = 1.0
lr_warmup_steps = 30

Run History & Configuration

Initial Configuration (Run 1)

Trained for ? Unbatched steps in this first run.

Key Parameters:

--manifold_weight 3.0: Controls the weighting of the manifold loss to the cosine loss in the overall REPA loss (as set in the U-REPA paper).
--repa_lambda 0.50: Controls the weighting of the REPA loss to the regular L2 loss. loss = loss + (repa_lambda * total_repa_loss).