Instructions to use doge1516/MS-Diffusion with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use doge1516/MS-Diffusion with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("doge1516/MS-Diffusion", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: diffusers | |
| tags: | |
| - text-to-image | |
| - stable diffusion | |
| - personalization | |
| - msdiffusion | |
| # Introduction | |
| Our research introduces the MS-Diffusion framework for layout-guided zero-shot image personalization with multi-subjects. This innovative approach integrates grounding tokens with the feature resampler to maintain detail fidelity among subjects. With the layout guidance, MS-Diffusion further improves the cross-attention to adapt to the multi-subject inputs, ensuring that each subject condition acts on specific areas. The proposed multi-subject cross-attention orchestrates harmonious inter-subject compositions while preserving the control of texts. | |
|  | |
| - **Project Page:** [https://eclipse-t2i.github.io/Lambda-ECLIPSE/](https://eclipse-t2i.github.io/Lambda-ECLIPSE/) | |
| - **GitHub:** [https://github.com/Maitreyapatel/lambda-eclipse-inference](https://github.com/Maitreyapatel/lambda-eclipse-inference) | |
| - **Paper (arXiv):** [https://arxiv.org/abs/2402.05195](https://arxiv.org/abs/2402.05195) | |
| # Model | |
| Download the pretrained base models from [SDXL-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [CLIP-G](). | |
| Please refer to our [GitHub repository]() to prepare the environment and get detailed instructions on how to run the model. | |
| # Important Notes | |
| - This repo only contains the trained model checkpoint without data, code, or base models. Please check the GitHub repository carefully to get detailed instructions. | |
| - The `scale` parameter is used to determine the extent of image control. For default, the `scale` is set to 0.6. In practice, the `scale` of 0.4 would be better if your input contains subjects needing to effect on the whole image, such as the background. **Feel free to adjust the `scale` in your applications.** | |
| - The model prefers to need layout inputs. You can use the default layouts in the inference script, while more accurate and realistic layouts generate better results. | |
| - Though MS-Diffusion beats SOTA personalized diffusion methods in both single-subject and multi-subject generation, it still suffers from the influence of background in subject images. The best practice is to use masked images since they contain no irrelevant information. | |