Drop auto-rescale + silence-patch internals from model card
Browse files
README.md
CHANGED
|
@@ -86,24 +86,6 @@ server.generate_to_file(prompt=..., gen_duration=30.0)
|
|
| 86 |
| `rescale_scale` (`--rescale-scale`) | `"auto"` | Latent-side CFG std-rescale. The default is a cfg-aware schedule (0 below cfg=2, ramping to 1.0 by cfg=10) that keeps the output peak below 0 dBFS at every cfg. Pass any float in [0, 1] to override or 0 to disable. |
|
| 87 |
| `watermark` (`--no-watermark` to disable) | `True` | Apply [Resemble Perth](https://github.com/resemble-ai/Perth) imperceptible neural watermark to the output. Survives MP3/AAC, common edits; ≈ 100 % detection accuracy. |
|
| 88 |
|
| 89 |
-
### Auto rescale (CFG safety)
|
| 90 |
-
|
| 91 |
-
CFG amplifies the latent (`pred = cond + (cfg-1)·(cond - uncond)`). With no compensation, outputs hard-clip at `cfg ≥ 3`. Dramabox automatically applies a CFG-aware std-rescale schedule:
|
| 92 |
-
|
| 93 |
-
| cfg | auto rescale | output peak |
|
| 94 |
-
|---|---|---|
|
| 95 |
-
| ≤ 2 | 0.0 (disabled) | safely below 0 dBFS |
|
| 96 |
-
| 3 | 0.6 | ~−1.8 dBFS |
|
| 97 |
-
| 4–8 | 0.8 | ~−1 to −3 dBFS |
|
| 98 |
-
| 9 | 0.9 | ~−2.7 dBFS |
|
| 99 |
-
| 10 | 1.0 | ~−4.4 dBFS |
|
| 100 |
-
|
| 101 |
-
No clipping at any CFG, no manual tuning needed. Pass `rescale_scale=<float>` to override.
|
| 102 |
-
|
| 103 |
-
### End-of-clip silence patch (long-form safety)
|
| 104 |
-
|
| 105 |
-
The base LTX-2.3 DiT was trained on audio ≤ ~20 s and learned a strong end-of-clip silence prior at the next patchifier-aligned latent boundary (frame 513 ≈ 20.4 s). Dramabox automatically interpolates frames 512–513 from their neighbours before VAE decode whenever the output crosses 20.5 s — eliminating the ~30 ms silence dip that would otherwise show up in long generations. No flag, no override needed.
|
| 106 |
-
|
| 107 |
## Prompt format
|
| 108 |
|
| 109 |
```
|
|
|
|
| 86 |
| `rescale_scale` (`--rescale-scale`) | `"auto"` | Latent-side CFG std-rescale. The default is a cfg-aware schedule (0 below cfg=2, ramping to 1.0 by cfg=10) that keeps the output peak below 0 dBFS at every cfg. Pass any float in [0, 1] to override or 0 to disable. |
|
| 87 |
| `watermark` (`--no-watermark` to disable) | `True` | Apply [Resemble Perth](https://github.com/resemble-ai/Perth) imperceptible neural watermark to the output. Survives MP3/AAC, common edits; ≈ 100 % detection accuracy. |
|
| 88 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
## Prompt format
|
| 90 |
|
| 91 |
```
|