Instructions to use Glanty/Capybara with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Glanty/Capybara with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Glanty/Capybara", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
update README.md
Browse files
README.md
CHANGED
|
@@ -45,6 +45,27 @@ The framework leverages advanced diffusion models and transformer architectures
|
|
| 45 |
- [ ] Release our unified creation model.
|
| 46 |
- [ ] Release training code.
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
## 🛠️ Installation
|
| 49 |
|
| 50 |
We recommend using Anaconda to create an isolated Python environment and recommend using CUDA 12.6:
|
|
|
|
| 45 |
- [ ] Release our unified creation model.
|
| 46 |
- [ ] Release training code.
|
| 47 |
|
| 48 |
+
## 🏞️ Show Caeses
|
| 49 |
+
**Results of generation tasks.** We show two generation tasks under our unified model. The top section presents text-to-image results, illustrating high-fidelity synthesis across diverse styles. The bottom rows show text-to-video results, demonstrating temporally coherent generation with natural motion for both realistic and stylized content.
|
| 50 |
+
<p align="center">
|
| 51 |
+
<img src="./assets/misc/gen_teaser.png" style="width: 100%; height: auto;"/>
|
| 52 |
+
</p>
|
| 53 |
+
|
| 54 |
+
**Results of image editing tasks.** We show the results of both instruction-based image editing and in-context image editing. The examples cover local and global edits (e.g., time-of-day and style changes), background replacement, and expression control. We further demonstrate multi-turn editing, where edits are applied sequentially. We also show in-context editing guided by a refenece image.
|
| 55 |
+
<p align="center">
|
| 56 |
+
<img src="./assets/misc/imageedit_teaser.png" style="width: 100%; height: auto;"/>
|
| 57 |
+
</p>
|
| 58 |
+
|
| 59 |
+
**Results of instruction-based video editing task.** We showcase instruction-based editing (TV2V) under our unified creation interface, covering local edits, global edits, dense prediction, and dynamic edits. Each example presents input frames and the edited outputs, highlighting temporally coherent transformations that preserve identity and overall structure.
|
| 60 |
+
<p align="center">
|
| 61 |
+
<img src="./assets/misc/videoedit_teaser5.png" style="width: 100%; height: auto;"/>
|
| 62 |
+
</p>
|
| 63 |
+
|
| 64 |
+
**Results of in-context visual creation.** We show in-context generation and in-context editing results , including subject-conditioned generation (S2V/S2I), conditional generation(C2V), image-to-video(I2V), reference-driven editing (II2I/IV2V).
|
| 65 |
+
<p align="center">
|
| 66 |
+
<img src="./assets/misc/incontext_teaser2.png" style="width: 100%; height: auto;"/>
|
| 67 |
+
</p>
|
| 68 |
+
|
| 69 |
## 🛠️ Installation
|
| 70 |
|
| 71 |
We recommend using Anaconda to create an isolated Python environment and recommend using CUDA 12.6:
|