Fashionable Spatial Encoder

Please note: This is only compatible with Cozyberry

Second note: The drift between the original and the trained model is not significant enough yet.

There are two straightforward methods to set up conditions for the diffusion model:

by improving the text encoder, in this case, a BERT model
by modifying the cross-attention weights

This is a research repo for a diffusion model.

During inference, we know exactly which part of the prompt applies to which part of the image.

The fundamental issue is that millions of samples are processed through the diffusion model during training, but only the mean loss of text-driven image generation is calculated. Meanwhile, the text and the image show a variety of the colours, shapes and other objects. The positions of these features in the encoded data are discarded for no reason.

This is what has changed here. Failure to follow the exact spatial description in the prompt will result in further penalties.

Source data

synthetic booru fashion
horizontal scenes

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support