Instructions to use Glanty/Capybara with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Glanty/Capybara with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Glanty/Capybara", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
update README.md
Browse files
README.md
CHANGED
|
@@ -39,7 +39,7 @@ The framework leverages advanced diffusion models and transformer architectures
|
|
| 39 |
|
| 40 |
## π₯ News
|
| 41 |
|
| 42 |
-
* **[2026.02.20]** π¨ Added [ComfyUI support](#-comfyui-support) with custom nodes for all task types (T2I, T2V, TI2I, TV2V).
|
| 43 |
* **[2026.02.17]** π Initial release v0.1 of the Capybara inference framework supporting generation and instruction-based editing tasks (T2I, T2V, TI2I, TV2V).
|
| 44 |
|
| 45 |
## π TODO List
|
|
@@ -296,6 +296,7 @@ A sample workflow is provided in [`comfyui/examples/`](https://github.com/xgen-u
|
|
| 296 |
| `--rewrite_instruction` | `False` | Auto-enhance prompts using Qwen3-VL-8B-Instruct |
|
| 297 |
| `--rewrite_model_path` | `Qwen/Qwen3-VL-8B-Instruct` | Path to the rewrite model |
|
| 298 |
| `--max_samples` | `None` | Limit the number of samples to process from CSV |
|
|
|
|
| 299 |
|
| 300 |
### Recommended Settings
|
| 301 |
|
|
@@ -310,6 +311,18 @@ For optimal quality and performance, we recommend the following settings:
|
|
| 310 |
- **Resolution**: You can experiment with higher resolutions (`1024` or `1080p`).
|
| 311 |
- **Inference Steps**: 50 steps provide a good balance between quality and speed. You can use 30-40 steps for faster generation.
|
| 312 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 313 |
## π License
|
| 314 |
|
| 315 |
This project is released under the MIT License.
|
|
|
|
| 39 |
|
| 40 |
## π₯ News
|
| 41 |
|
| 42 |
+
* **[2026.02.20]** π¨ Added [ComfyUI support](#-comfyui-support) with custom nodes for all task types (T2I, T2V, TI2I, TV2V), together with [FP8 quantization](#-fp8-quantization) support for the inference script and ComfyUI custom node.
|
| 43 |
* **[2026.02.17]** π Initial release v0.1 of the Capybara inference framework supporting generation and instruction-based editing tasks (T2I, T2V, TI2I, TV2V).
|
| 44 |
|
| 45 |
## π TODO List
|
|
|
|
| 296 |
| `--rewrite_instruction` | `False` | Auto-enhance prompts using Qwen3-VL-8B-Instruct |
|
| 297 |
| `--rewrite_model_path` | `Qwen/Qwen3-VL-8B-Instruct` | Path to the rewrite model |
|
| 298 |
| `--max_samples` | `None` | Limit the number of samples to process from CSV |
|
| 299 |
+
| `--quantize` | `None` | Quantize transformer weights (`fp8`). See [FP8 Quantization](#-fp8-quantization). |
|
| 300 |
|
| 301 |
### Recommended Settings
|
| 302 |
|
|
|
|
| 311 |
- **Resolution**: You can experiment with higher resolutions (`1024` or `1080p`).
|
| 312 |
- **Inference Steps**: 50 steps provide a good balance between quality and speed. You can use 30-40 steps for faster generation.
|
| 313 |
|
| 314 |
+
## β‘ FP8 Quantization
|
| 315 |
+
|
| 316 |
+
Capybara supports FP8 (E4M3) weight-only quantization for the transformer via [torchao](https://github.com/pytorch/ao). This roughly halves the transformer's weight memory, allowing larger resolutions or longer videos to fit in GPU VRAM.
|
| 317 |
+
|
| 318 |
+
**Requirements:**
|
| 319 |
+
- NVIDIA GPU with compute capability >= 8.9 (Ada Lovelace or Hopper, e.g. RTX 4090, L40, H100)
|
| 320 |
+
- `torchao` installed (`pip install torchao`)
|
| 321 |
+
|
| 322 |
+
### ComfyUI
|
| 323 |
+
|
| 324 |
+
In the **Capybara Load Pipeline** node, set the `quantize` dropdown to **fp8**. The node handles everything automatically -- the transformer will be loaded in FP8 on GPU while other components (VAE, text encoders, etc.) still offload to CPU as usual.
|
| 325 |
+
|
| 326 |
## π License
|
| 327 |
|
| 328 |
This project is released under the MIT License.
|