Manmay commited on
Commit
5c1cbab
·
verified ·
1 Parent(s): 7101eac

Drop auto-rescale + silence-patch internals from model card

Browse files
Files changed (1) hide show
  1. README.md +0 -18
README.md CHANGED
@@ -86,24 +86,6 @@ server.generate_to_file(prompt=..., gen_duration=30.0)
86
  | `rescale_scale` (`--rescale-scale`) | `"auto"` | Latent-side CFG std-rescale. The default is a cfg-aware schedule (0 below cfg=2, ramping to 1.0 by cfg=10) that keeps the output peak below 0 dBFS at every cfg. Pass any float in [0, 1] to override or 0 to disable. |
87
  | `watermark` (`--no-watermark` to disable) | `True` | Apply [Resemble Perth](https://github.com/resemble-ai/Perth) imperceptible neural watermark to the output. Survives MP3/AAC, common edits; ≈ 100 % detection accuracy. |
88
 
89
- ### Auto rescale (CFG safety)
90
-
91
- CFG amplifies the latent (`pred = cond + (cfg-1)·(cond - uncond)`). With no compensation, outputs hard-clip at `cfg ≥ 3`. Dramabox automatically applies a CFG-aware std-rescale schedule:
92
-
93
- | cfg | auto rescale | output peak |
94
- |---|---|---|
95
- | ≤ 2 | 0.0 (disabled) | safely below 0 dBFS |
96
- | 3 | 0.6 | ~−1.8 dBFS |
97
- | 4–8 | 0.8 | ~−1 to −3 dBFS |
98
- | 9 | 0.9 | ~−2.7 dBFS |
99
- | 10 | 1.0 | ~−4.4 dBFS |
100
-
101
- No clipping at any CFG, no manual tuning needed. Pass `rescale_scale=<float>` to override.
102
-
103
- ### End-of-clip silence patch (long-form safety)
104
-
105
- The base LTX-2.3 DiT was trained on audio ≤ ~20 s and learned a strong end-of-clip silence prior at the next patchifier-aligned latent boundary (frame 513 ≈ 20.4 s). Dramabox automatically interpolates frames 512–513 from their neighbours before VAE decode whenever the output crosses 20.5 s — eliminating the ~30 ms silence dip that would otherwise show up in long generations. No flag, no override needed.
106
-
107
  ## Prompt format
108
 
109
  ```
 
86
  | `rescale_scale` (`--rescale-scale`) | `"auto"` | Latent-side CFG std-rescale. The default is a cfg-aware schedule (0 below cfg=2, ramping to 1.0 by cfg=10) that keeps the output peak below 0 dBFS at every cfg. Pass any float in [0, 1] to override or 0 to disable. |
87
  | `watermark` (`--no-watermark` to disable) | `True` | Apply [Resemble Perth](https://github.com/resemble-ai/Perth) imperceptible neural watermark to the output. Survives MP3/AAC, common edits; ≈ 100 % detection accuracy. |
88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  ## Prompt format
90
 
91
  ```