Glanty
/

Capybara

Any-to-Any

Diffusers

Safetensors

Model card Files Files and versions

xet

Community

RainCCH commited on Feb 23

Commit

f2fd70e

1 Parent(s): 8884191

update README.md

Browse files

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -39,7 +39,7 @@ The framework leverages advanced diffusion models and transformer architectures
 ## 🔥 News
-* **[2026.02.20]** 🎨 Added [ComfyUI support](#-comfyui-support) with custom nodes for all task types (T2I, T2V, TI2I, TV2V).
 * **[2026.02.17]** 🚀 Initial release v0.1 of the Capybara inference framework supporting generation and instruction-based editing tasks (T2I, T2V, TI2I, TV2V).
 ## 📝 TODO List
@@ -296,6 +296,7 @@ A sample workflow is provided in [`comfyui/examples/`](https://github.com/xgen-u
 | `--rewrite_instruction`    | `False`     | Auto-enhance prompts using Qwen3-VL-8B-Instruct           |
 | `--rewrite_model_path`     | `Qwen/Qwen3-VL-8B-Instruct` | Path to the rewrite model              |
 | `--max_samples`            | `None`      | Limit the number of samples to process from CSV            |
 ### Recommended Settings
@@ -310,6 +311,18 @@ For optimal quality and performance, we recommend the following settings:
 - **Resolution**: You can experiment with higher resolutions (`1024` or `1080p`).
 - **Inference Steps**: 50 steps provide a good balance between quality and speed. You can use 30-40 steps for faster generation.
 ## 📄 License
 This project is released under the MIT License.

 ## 🔥 News
+* **[2026.02.20]** 🎨 Added [ComfyUI support](#-comfyui-support) with custom nodes for all task types (T2I, T2V, TI2I, TV2V), together with [FP8 quantization](#-fp8-quantization) support for the inference script and ComfyUI custom node.
 * **[2026.02.17]** 🚀 Initial release v0.1 of the Capybara inference framework supporting generation and instruction-based editing tasks (T2I, T2V, TI2I, TV2V).
 ## 📝 TODO List
 | `--rewrite_instruction`    | `False`     | Auto-enhance prompts using Qwen3-VL-8B-Instruct           |
 | `--rewrite_model_path`     | `Qwen/Qwen3-VL-8B-Instruct` | Path to the rewrite model              |
 | `--max_samples`            | `None`      | Limit the number of samples to process from CSV            |
+| `--quantize`               | `None`      | Quantize transformer weights (`fp8`). See [FP8 Quantization](#-fp8-quantization). |
 ### Recommended Settings
 - **Resolution**: You can experiment with higher resolutions (`1024` or `1080p`).
 - **Inference Steps**: 50 steps provide a good balance between quality and speed. You can use 30-40 steps for faster generation.
+## ⚡ FP8 Quantization
+Capybara supports FP8 (E4M3) weight-only quantization for the transformer via [torchao](https://github.com/pytorch/ao). This roughly halves the transformer's weight memory, allowing larger resolutions or longer videos to fit in GPU VRAM.
+**Requirements:**
+- NVIDIA GPU with compute capability >= 8.9 (Ada Lovelace or Hopper, e.g. RTX 4090, L40, H100)
+- `torchao` installed (`pip install torchao`)
+### ComfyUI
+In the **Capybara Load Pipeline** node, set the `quantize` dropdown to **fp8**. The node handles everything automatically -- the transformer will be loaded in FP8 on GPU while other components (VAE, text encoders, etc.) still offload to CPU as usual.
 ## 📄 License
 This project is released under the MIT License.