RainCCH commited on
Commit
f2fd70e
Β·
1 Parent(s): 8884191

update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -39,7 +39,7 @@ The framework leverages advanced diffusion models and transformer architectures
39
 
40
  ## πŸ”₯ News
41
 
42
- * **[2026.02.20]** 🎨 Added [ComfyUI support](#-comfyui-support) with custom nodes for all task types (T2I, T2V, TI2I, TV2V).
43
  * **[2026.02.17]** πŸš€ Initial release v0.1 of the Capybara inference framework supporting generation and instruction-based editing tasks (T2I, T2V, TI2I, TV2V).
44
 
45
  ## πŸ“ TODO List
@@ -296,6 +296,7 @@ A sample workflow is provided in [`comfyui/examples/`](https://github.com/xgen-u
296
  | `--rewrite_instruction` | `False` | Auto-enhance prompts using Qwen3-VL-8B-Instruct |
297
  | `--rewrite_model_path` | `Qwen/Qwen3-VL-8B-Instruct` | Path to the rewrite model |
298
  | `--max_samples` | `None` | Limit the number of samples to process from CSV |
 
299
 
300
  ### Recommended Settings
301
 
@@ -310,6 +311,18 @@ For optimal quality and performance, we recommend the following settings:
310
  - **Resolution**: You can experiment with higher resolutions (`1024` or `1080p`).
311
  - **Inference Steps**: 50 steps provide a good balance between quality and speed. You can use 30-40 steps for faster generation.
312
 
 
 
 
 
 
 
 
 
 
 
 
 
313
  ## πŸ“„ License
314
 
315
  This project is released under the MIT License.
 
39
 
40
  ## πŸ”₯ News
41
 
42
+ * **[2026.02.20]** 🎨 Added [ComfyUI support](#-comfyui-support) with custom nodes for all task types (T2I, T2V, TI2I, TV2V), together with [FP8 quantization](#-fp8-quantization) support for the inference script and ComfyUI custom node.
43
  * **[2026.02.17]** πŸš€ Initial release v0.1 of the Capybara inference framework supporting generation and instruction-based editing tasks (T2I, T2V, TI2I, TV2V).
44
 
45
  ## πŸ“ TODO List
 
296
  | `--rewrite_instruction` | `False` | Auto-enhance prompts using Qwen3-VL-8B-Instruct |
297
  | `--rewrite_model_path` | `Qwen/Qwen3-VL-8B-Instruct` | Path to the rewrite model |
298
  | `--max_samples` | `None` | Limit the number of samples to process from CSV |
299
+ | `--quantize` | `None` | Quantize transformer weights (`fp8`). See [FP8 Quantization](#-fp8-quantization). |
300
 
301
  ### Recommended Settings
302
 
 
311
  - **Resolution**: You can experiment with higher resolutions (`1024` or `1080p`).
312
  - **Inference Steps**: 50 steps provide a good balance between quality and speed. You can use 30-40 steps for faster generation.
313
 
314
+ ## ⚑ FP8 Quantization
315
+
316
+ Capybara supports FP8 (E4M3) weight-only quantization for the transformer via [torchao](https://github.com/pytorch/ao). This roughly halves the transformer's weight memory, allowing larger resolutions or longer videos to fit in GPU VRAM.
317
+
318
+ **Requirements:**
319
+ - NVIDIA GPU with compute capability >= 8.9 (Ada Lovelace or Hopper, e.g. RTX 4090, L40, H100)
320
+ - `torchao` installed (`pip install torchao`)
321
+
322
+ ### ComfyUI
323
+
324
+ In the **Capybara Load Pipeline** node, set the `quantize` dropdown to **fp8**. The node handles everything automatically -- the transformer will be loaded in FP8 on GPU while other components (VAE, text encoders, etc.) still offload to CPU as usual.
325
+
326
  ## πŸ“„ License
327
 
328
  This project is released under the MIT License.