github-actions[bot] commited on
Commit
12ab2ca
Β·
1 Parent(s): ffee483

deploy: switch to chatterbox requirements @ a95fda4

Browse files
graphify-out/GRAPH_REPORT.md CHANGED
@@ -1,12 +1,12 @@
1
- # Graph Report - VideoVoice-be (2026-05-16)
2
 
3
  ## Corpus Check
4
- - 59 files Β· ~253,292 words
5
  - Verdict: corpus is large enough that graph structure adds value.
6
 
7
  ## Summary
8
- - 1050 nodes Β· 1833 edges Β· 62 communities detected
9
- - Extraction: 79% EXTRACTED Β· 21% INFERRED Β· 0% AMBIGUOUS Β· INFERRED: 389 edges (avg confidence: 0.62)
10
  - Token cost: 0 input Β· 0 output
11
 
12
  ## Community Hubs (Navigation)
@@ -32,9 +32,9 @@
32
  - [[_COMMUNITY_Community 19|Community 19]]
33
  - [[_COMMUNITY_Community 20|Community 20]]
34
  - [[_COMMUNITY_Community 21|Community 21]]
 
35
  - [[_COMMUNITY_Community 23|Community 23]]
36
- - [[_COMMUNITY_Community 31|Community 31]]
37
- - [[_COMMUNITY_Community 32|Community 32]]
38
  - [[_COMMUNITY_Community 33|Community 33]]
39
  - [[_COMMUNITY_Community 34|Community 34]]
40
  - [[_COMMUNITY_Community 35|Community 35]]
@@ -72,6 +72,8 @@
72
  - [[_COMMUNITY_Community 67|Community 67]]
73
  - [[_COMMUNITY_Community 68|Community 68]]
74
  - [[_COMMUNITY_Community 69|Community 69]]
 
 
75
 
76
  ## God Nodes (most connected - your core abstractions)
77
  1. `Qwen3TTSSpeakerEncoderConfig` - 49 edges
@@ -90,8 +92,8 @@
90
  requirements.txt β†’ requirements-omni.txt
91
  - `gradio==6.8.0` --semantically_similar_to--> `gradio==6.12.0 (omni)` [INFERRED] [semantically similar]
92
  requirements.txt β†’ requirements-omni.txt
93
- - `content_length_middleware()` --calls--> `enforce_content_length_limit()` [INFERRED]
94
- app.py β†’ server.py
95
  - `run_pipeline()` --calls--> `separate_audio()` [INFERRED]
96
  pipeline.py β†’ steps/s1b_separate.py
97
  - `run_pipeline()` --calls--> `transcribe()` [INFERRED]
@@ -106,47 +108,47 @@
106
 
107
  ### Community 0 - "Community 0"
108
  Cohesion: 0.04
109
- Nodes (69): Qwen3TTSConfig, Qwen3TTSSpeakerEncoderConfig, Qwen3TTSTalkerCodePredictorConfig, Qwen3TTSTalkerConfig, r""" This is the configuration class to store the configuration of a [`Qwen3, r""" This is the configuration class to store the configuration of a [`Qwen3, This is the configuration class to store the configuration of a [`Qwen3TTSForCon, r""" This is the configuration class to store the configuration of a [`Qwen3 (+61 more)
110
 
111
  ### Community 1 - "Community 1"
112
  Cohesion: 0.02
113
  Nodes (118): api_run_pipeline(), content_length_middleware(), ZeroGPU-compatible entrypoint using gradio.Server. Server extends FastAPI, so al, Exposed through Gradio's API engine. ZeroGPU will allocate a GPU when this e, run_pipeline(), BaseHTTPMiddleware, BaseModel, _artifact_reaper_loop() (+110 more)
114
 
115
  ### Community 2 - "Community 2"
116
- Cohesion: 0.05
117
- Nodes (57): ABC, BasePoster, Abstract base class for platform posters., Save a debug screenshot on failure., BasePoster, _build_system_prompt(), _build_user_prompt(), format_caption() (+49 more)
118
 
119
  ### Community 3 - "Community 3"
120
  Cohesion: 0.05
121
- Nodes (59): _collect_output(), _log_step_done(), main(), pipeline.py β€” Core pipeline: CLI entrypoint + importable run_pipeline() for Grad, Print duration + separator line for a completed step., Collect all yields and the return value from the generator., Run the full translation pipeline, yielding progress messages. Args:, run_pipeline() (+51 more)
122
 
123
  ### Community 4 - "Community 4"
124
  Cohesion: 0.06
125
- Nodes (55): forward(), generate(), generate_speaker_prompt(), main(), _prefetch_chatterbox(), _prefetch_demucs(), _prefetch_faster_whisper(), Prefetch model weights into HF_HOME for faster cold starts on Spaces. (+47 more)
126
 
127
  ### Community 5 - "Community 5"
128
  Cohesion: 0.06
129
  Nodes (59): post(), _assign_words_to_segments(), _extract_words(), _get_faster_whisper_model(), _get_local_whisper_backend(), _get_openai_whisper_model(), _normalise_segments(), Step 3: Transcribe audio with timestamps. Primary local backend (device-depende (+51 more)
130
 
131
  ### Community 6 - "Community 6"
132
- Cohesion: 0.06
133
- Nodes (31): _audio_to_tuple(), _build_choices_and_map(), build_demo(), build_parser(), _collect_gen_kwargs(), _detect_model_kind(), _dtype_from_str(), main() (+23 more)
134
-
135
- ### Community 7 - "Community 7"
136
  Cohesion: 0.07
137
- Nodes (25): DistributedGroupResidualVectorQuantization, Efficient distributed group residual vector quantization implementation. Fol, dynamic_range_compression_torch(), MelSpectrogramFeatures, x: torch.Tensor, shape = (T, D) q: torch.Tensor, shape = (T, D), x : torch.Tensor, shape = (n_mels, n_ctx) the mel spectrogram of the, Calculate the BigVGAN style mel spectrogram of an input signal. Args:, spectral_normalize_torch() (+17 more)
138
 
139
- ### Community 8 - "Community 8"
140
  Cohesion: 0.05
141
  Nodes (49): FFmpeg concat list (synced TTS), Try-Now app panel, app.js script ref, Comparison table (HeyGen, Rask, ElevenLabs, Synthesia), Hero section + 23+ languages, Frontend index.html, Source/target language selectors, Pricing tiers (Free/Starter/Creator) (+41 more)
142
 
 
 
 
 
143
  ### Community 9 - "Community 9"
144
  Cohesion: 0.09
145
  Nodes (27): $(), clearFile(), createDemoCard(), detectPlatform(), formatBytes(), formatDemoDate(), formatDemoTitle(), getUsedVideos() (+19 more)
146
 
147
  ### Community 10 - "Community 10"
148
- Cohesion: 0.1
149
- Nodes (14): default(), DistributedResidualVectorQuantization, ema_inplace(), EuclideanCodebook, kmeans(), laplace_smoothing(), postprocess_emb(), preprocess() (+6 more)
150
 
151
  ### Community 11 - "Community 11"
152
  Cohesion: 0.08
@@ -154,20 +156,20 @@ Nodes (32): _apply_demucs(), _get_model(), _load_and_normalise(), Step 1b: Separ
154
 
155
  ### Community 12 - "Community 12"
156
  Cohesion: 0.1
157
- Nodes (31): Step 4: Translate segment texts using Pollinations chat completions API (OpenAI-, Translate a batch of segments into target_language., Translate the text of each segment into target_language in batches. Args:, translate(), _translate_batch(), bedrock_converse(), bedrock_fallback(), build_client() (+23 more)
158
 
159
  ### Community 13 - "Community 13"
160
  Cohesion: 0.12
161
  Nodes (27): build_for_job(), ensure_transcription(), extract_audio_hq(), extract_reference_audio(), get_audio_duration(), get_device(), load_chatterbox(), main() (+19 more)
162
 
163
  ### Community 14 - "Community 14"
164
- Cohesion: 0.11
165
- Nodes (25): tools_api β€” Standalone endpoints for creator quick tools. Lives alongside the m, audio_cleanup_endpoint(), _ext_to_media_type(), APIRouter for /api/tools/* endpoints. Each endpoint is sync request-response (n, Serve a generated artifact. Run dirs auto-expire after RUN_TTL_SECONDS., Manual reap trigger (mostly for testing). Auto-reap runs on a timer., Stream upload to disk, enforcing the tools size cap., _reap() (+17 more)
166
-
167
- ### Community 15 - "Community 15"
168
  Cohesion: 0.12
169
  Nodes (23): build_t3_cond(), main(), prepare_sample(), prepare_sample.py β€” Turn one dataset.jsonl row into the exact tensors T3.loss(), Build the speaker conditioning (frozen during training)., MTLTokenizer + SOT/EOT padding (mirrors what generate() does internally)., S3Tokenizer on the target dubbed audio β†’ speech tokens (the LABEL). Critica, Turn one dataset row into ready-to-train tensors. (+15 more)
170
 
 
 
 
 
171
  ### Community 16 - "Community 16"
172
  Cohesion: 0.19
173
  Nodes (18): _burn_in(), _clamp(), _extract_audio(), _force_style_for(), _format_timestamp_srt(), _format_timestamp_vtt(), generate_subtitles(), _is_video() (+10 more)
@@ -189,260 +191,268 @@ Cohesion: 0.27
189
  Nodes (9): get_fallback_mode(), _get_handler(), get_translation_prompt(), post_translate(), Language-specific handlers for the translation pipeline. Each language that nee, Return a language-specific translation prompt, or the default., Return 'bedrock' or 'google' depending on the language., Run any language-specific post-processing after translation. (+1 more)
190
 
191
  ### Community 21 - "Community 21"
 
 
 
 
 
 
 
 
192
  Cohesion: 0.33
193
  Nodes (6): app.py validation, pipeline.py simplified, steps/s4_preview.py, steps/s4_tts.py conditional imports, server.py /api/config, TTS_ENGINE env var
194
 
195
- ### Community 23 - "Community 23"
196
  Cohesion: 1.0
197
  Nodes (2): gradio==6.8.0, gradio==6.12.0 (omni)
198
 
199
- ### Community 31 - "Community 31"
200
  Cohesion: 1.0
201
  Nodes (1): Load a Qwen3 TTS model and its processor in HuggingFace `from_pretrained` style.
202
 
203
- ### Community 32 - "Community 32"
204
  Cohesion: 1.0
205
  Nodes (1): Build voice-clone prompt items from reference audio (and optionally reference te
206
 
207
- ### Community 33 - "Community 33"
208
  Cohesion: 1.0
209
  Nodes (1): Voice clone speech using the Base model. You can provide either:
210
 
211
- ### Community 34 - "Community 34"
212
  Cohesion: 1.0
213
  Nodes (1): Generate speech with the VoiceDesign model using natural-language style instruct
214
 
215
- ### Community 35 - "Community 35"
216
  Cohesion: 1.0
217
  Nodes (1): Generate speech with the CustomVoice model using a predefined speaker id, option
218
 
219
- ### Community 36 - "Community 36"
220
  Cohesion: 1.0
221
  Nodes (1): Delete stale per-job artifact directories from ARTIFACTS_ROOT.
222
 
223
- ### Community 37 - "Community 37"
224
  Cohesion: 1.0
225
  Nodes (1): Reject oversized uploads before body parsing.
226
 
227
- ### Community 38 - "Community 38"
228
  Cohesion: 1.0
229
  Nodes (1): Run the translation pipeline in a background thread, pushing progress to the job
230
 
231
- ### Community 39 - "Community 39"
232
  Cohesion: 1.0
233
  Nodes (1): List whitelisted MP4 demo videos from outputs/ and data/.
234
 
235
- ### Community 40 - "Community 40"
236
  Cohesion: 1.0
237
  Nodes (1): Return curated showcase entries with resolved streaming URLs.
238
 
239
- ### Community 41 - "Community 41"
240
  Cohesion: 1.0
241
  Nodes (1): Submit a video for translation.
242
 
243
- ### Community 42 - "Community 42"
244
  Cohesion: 1.0
245
  Nodes (1): Poll endpoint returning new messages since index `after`, plus live wait status.
246
 
247
- ### Community 43 - "Community 43"
248
  Cohesion: 1.0
249
  Nodes (1): User selects a TTS model after previewing.
250
 
251
- ### Community 44 - "Community 44"
252
  Cohesion: 1.0
253
  Nodes (1): Serve a preview audio WAV file.
254
 
255
- ### Community 45 - "Community 45"
256
  Cohesion: 1.0
257
  Nodes (1): Download the translated video.
258
 
259
- ### Community 46 - "Community 46"
260
  Cohesion: 1.0
261
  Nodes (1): Create artifact directories and start background cleanup.
262
 
263
- ### Community 47 - "Community 47"
264
  Cohesion: 1.0
265
  Nodes (1): Sync TTS audio using pause-aware strategy: compress silences first, then atempo.
266
 
267
- ### Community 48 - "Community 48"
268
  Cohesion: 1.0
269
  Nodes (1): Rewrite WAV with silence regions compressed to keep_ratio of their original dura
270
 
271
- ### Community 49 - "Community 49"
272
  Cohesion: 1.0
273
  Nodes (1): Insert extra silence distributed across detected pause points.
274
 
275
- ### Community 50 - "Community 50"
276
  Cohesion: 1.0
277
  Nodes (1): Generate a silent WAV file of given duration.
278
 
279
- ### Community 51 - "Community 51"
280
  Cohesion: 1.0
281
  Nodes (1): Sync each TTS segment to its original timestamp window and stitch into a single
282
 
283
- ### Community 52 - "Community 52"
284
  Cohesion: 1.0
285
  Nodes (1): Translate the text of each segment into target_language in batches. Args:
286
 
287
- ### Community 53 - "Community 53"
288
  Cohesion: 1.0
289
  Nodes (1): Load + run Chatterbox inside a single GPU-decorated scope. ZeroGPU only int
290
 
291
- ### Community 54 - "Community 54"
292
  Cohesion: 1.0
293
  Nodes (1): Remove trailing noise/artifacts after speech ends.
294
 
295
- ### Community 55 - "Community 55"
296
  Cohesion: 1.0
297
  Nodes (1): Hard-trim TTS output to orig_dur * headroom, with a short fade-out.
298
 
299
- ### Community 56 - "Community 56"
300
  Cohesion: 1.0
301
  Nodes (1): Clip audio to max_sec to prevent excessively slow voice cloning.
302
 
303
- ### Community 57 - "Community 57"
304
  Cohesion: 1.0
305
  Nodes (1): Numpy variant of _trim_trailing_noise for engines returning np.ndarray.
306
 
307
- ### Community 58 - "Community 58"
308
  Cohesion: 1.0
309
  Nodes (1): Perform full OmniVoice processing (load + generate batch) inside a GPU-decorated
310
 
311
- ### Community 59 - "Community 59"
312
  Cohesion: 1.0
313
  Nodes (1): Generate speech for all segments using OmniVoice voice cloning.
314
 
315
- ### Community 60 - "Community 60"
316
  Cohesion: 1.0
317
  Nodes (1): Synthesise translated text for each segment using voice cloned from reference au
318
 
319
- ### Community 61 - "Community 61"
320
  Cohesion: 1.0
321
  Nodes (1): torch==2.6.0
322
 
323
- ### Community 62 - "Community 62"
324
  Cohesion: 1.0
325
  Nodes (1): fastapi
326
 
327
- ### Community 63 - "Community 63"
328
  Cohesion: 1.0
329
  Nodes (1): yt-dlp
330
 
331
- ### Community 64 - "Community 64"
332
  Cohesion: 1.0
333
  Nodes (1): diffusers==0.29.0
334
 
335
- ### Community 65 - "Community 65"
336
  Cohesion: 1.0
337
  Nodes (1): ARTIFACTS_ROOT env
338
 
339
- ### Community 66 - "Community 66"
340
  Cohesion: 1.0
341
  Nodes (1): AWS g4dn.xlarge alternative
342
 
343
- ### Community 67 - "Community 67"
344
  Cohesion: 1.0
345
  Nodes (1): nodejs (system pkg)
346
 
347
- ### Community 68 - "Community 68"
348
  Cohesion: 1.0
349
  Nodes (1): fonts-noto-core / cjk
350
 
351
- ### Community 69 - "Community 69"
352
  Cohesion: 1.0
353
  Nodes (1): graphify project rules
354
 
355
  ## Knowledge Gaps
356
- - **321 isolated node(s):** `server.py β€” FastAPI backend for VideoVoice. Endpoints: POST /api/jobs`, `Download video from Instagram/YouTube using yt-dlp.`, `Allow only trusted social platforms for yt-dlp.`, `Read media duration from ffprobe.`, `Report CUDA/MPS availability.` (+316 more)
357
  These have ≀1 connection - possible missing edges or undocumented components.
358
- - **Thin community `Community 23`** (2 nodes): `gradio==6.8.0`, `gradio==6.12.0 (omni)`
359
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
360
- - **Thin community `Community 31`** (1 nodes): `Load a Qwen3 TTS model and its processor in HuggingFace `from_pretrained` style.`
361
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
362
- - **Thin community `Community 32`** (1 nodes): `Build voice-clone prompt items from reference audio (and optionally reference te`
363
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
364
- - **Thin community `Community 33`** (1 nodes): `Voice clone speech using the Base model. You can provide either:`
365
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
366
- - **Thin community `Community 34`** (1 nodes): `Generate speech with the VoiceDesign model using natural-language style instruct`
367
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
368
- - **Thin community `Community 35`** (1 nodes): `Generate speech with the CustomVoice model using a predefined speaker id, option`
369
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
370
- - **Thin community `Community 36`** (1 nodes): `Delete stale per-job artifact directories from ARTIFACTS_ROOT.`
371
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
372
- - **Thin community `Community 37`** (1 nodes): `Reject oversized uploads before body parsing.`
373
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
374
- - **Thin community `Community 38`** (1 nodes): `Run the translation pipeline in a background thread, pushing progress to the job`
375
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
376
- - **Thin community `Community 39`** (1 nodes): `List whitelisted MP4 demo videos from outputs/ and data/.`
377
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
378
- - **Thin community `Community 40`** (1 nodes): `Return curated showcase entries with resolved streaming URLs.`
379
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
380
- - **Thin community `Community 41`** (1 nodes): `Submit a video for translation.`
381
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
382
- - **Thin community `Community 42`** (1 nodes): `Poll endpoint returning new messages since index `after`, plus live wait status.`
383
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
384
- - **Thin community `Community 43`** (1 nodes): `User selects a TTS model after previewing.`
385
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
386
- - **Thin community `Community 44`** (1 nodes): `Serve a preview audio WAV file.`
387
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
388
- - **Thin community `Community 45`** (1 nodes): `Download the translated video.`
389
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
390
- - **Thin community `Community 46`** (1 nodes): `Create artifact directories and start background cleanup.`
391
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
392
- - **Thin community `Community 47`** (1 nodes): `Sync TTS audio using pause-aware strategy: compress silences first, then atempo.`
393
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
394
- - **Thin community `Community 48`** (1 nodes): `Rewrite WAV with silence regions compressed to keep_ratio of their original dura`
395
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
396
- - **Thin community `Community 49`** (1 nodes): `Insert extra silence distributed across detected pause points.`
397
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
398
- - **Thin community `Community 50`** (1 nodes): `Generate a silent WAV file of given duration.`
399
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
400
- - **Thin community `Community 51`** (1 nodes): `Sync each TTS segment to its original timestamp window and stitch into a single`
401
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
402
- - **Thin community `Community 52`** (1 nodes): `Translate the text of each segment into target_language in batches. Args:`
403
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
404
- - **Thin community `Community 53`** (1 nodes): `Load + run Chatterbox inside a single GPU-decorated scope. ZeroGPU only int`
405
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
406
- - **Thin community `Community 54`** (1 nodes): `Remove trailing noise/artifacts after speech ends.`
407
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
408
- - **Thin community `Community 55`** (1 nodes): `Hard-trim TTS output to orig_dur * headroom, with a short fade-out.`
409
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
410
- - **Thin community `Community 56`** (1 nodes): `Clip audio to max_sec to prevent excessively slow voice cloning.`
411
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
412
- - **Thin community `Community 57`** (1 nodes): `Numpy variant of _trim_trailing_noise for engines returning np.ndarray.`
413
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
414
- - **Thin community `Community 58`** (1 nodes): `Perform full OmniVoice processing (load + generate batch) inside a GPU-decorated`
415
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
416
- - **Thin community `Community 59`** (1 nodes): `Generate speech for all segments using OmniVoice voice cloning.`
417
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
418
- - **Thin community `Community 60`** (1 nodes): `Synthesise translated text for each segment using voice cloned from reference au`
419
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
420
- - **Thin community `Community 61`** (1 nodes): `torch==2.6.0`
421
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
422
- - **Thin community `Community 62`** (1 nodes): `fastapi`
423
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
424
- - **Thin community `Community 63`** (1 nodes): `yt-dlp`
425
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
426
- - **Thin community `Community 64`** (1 nodes): `diffusers==0.29.0`
427
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
428
- - **Thin community `Community 65`** (1 nodes): `ARTIFACTS_ROOT env`
429
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
430
- - **Thin community `Community 66`** (1 nodes): `AWS g4dn.xlarge alternative`
431
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
432
- - **Thin community `Community 67`** (1 nodes): `nodejs (system pkg)`
433
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
434
- - **Thin community `Community 68`** (1 nodes): `fonts-noto-core / cjk`
435
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
436
- - **Thin community `Community 69`** (1 nodes): `graphify project rules`
437
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
438
 
439
  ## Suggested Questions
440
  _Questions this graph is uniquely positioned to answer:_
441
 
442
- - **Why does `synthesise_segments()` connect `Community 4` to `Community 11`, `Community 3`?**
443
  _High betweenness centrality (0.324) - this node is a cross-community bridge._
444
- - **Why does `generate()` connect `Community 4` to `Community 0`, `Community 6`?**
445
- _High betweenness centrality (0.209) - this node is a cross-community bridge._
446
  - **Are the 44 inferred relationships involving `Qwen3TTSSpeakerEncoderConfig` (e.g. with `Res2NetBlock` and `SqueezeExcitationBlock`) actually correct?**
447
  _`Qwen3TTSSpeakerEncoderConfig` has 44 INFERRED edges - model-reasoned connections that need verification._
448
  - **Are the 44 inferred relationships involving `Qwen3TTSTalkerCodePredictorConfig` (e.g. with `Res2NetBlock` and `SqueezeExcitationBlock`) actually correct?**
@@ -452,4 +462,4 @@ _Questions this graph is uniquely positioned to answer:_
452
  - **Are the 44 inferred relationships involving `Qwen3TTSConfig` (e.g. with `Res2NetBlock` and `SqueezeExcitationBlock`) actually correct?**
453
  _`Qwen3TTSConfig` has 44 INFERRED edges - model-reasoned connections that need verification._
454
  - **What connects `server.py β€” FastAPI backend for VideoVoice. Endpoints: POST /api/jobs`, `Download video from Instagram/YouTube using yt-dlp.`, `Allow only trusted social platforms for yt-dlp.` to the rest of the system?**
455
- _321 weakly-connected nodes found - possible documentation gaps or missing edges._
 
1
+ # Graph Report - VideoVoice-be (2026-05-17)
2
 
3
  ## Corpus Check
4
+ - 60 files Β· ~254,726 words
5
  - Verdict: corpus is large enough that graph structure adds value.
6
 
7
  ## Summary
8
+ - 1065 nodes Β· 1859 edges Β· 64 communities detected
9
+ - Extraction: 79% EXTRACTED Β· 21% INFERRED Β· 0% AMBIGUOUS Β· INFERRED: 397 edges (avg confidence: 0.62)
10
  - Token cost: 0 input Β· 0 output
11
 
12
  ## Community Hubs (Navigation)
 
32
  - [[_COMMUNITY_Community 19|Community 19]]
33
  - [[_COMMUNITY_Community 20|Community 20]]
34
  - [[_COMMUNITY_Community 21|Community 21]]
35
+ - [[_COMMUNITY_Community 22|Community 22]]
36
  - [[_COMMUNITY_Community 23|Community 23]]
37
+ - [[_COMMUNITY_Community 25|Community 25]]
 
38
  - [[_COMMUNITY_Community 33|Community 33]]
39
  - [[_COMMUNITY_Community 34|Community 34]]
40
  - [[_COMMUNITY_Community 35|Community 35]]
 
72
  - [[_COMMUNITY_Community 67|Community 67]]
73
  - [[_COMMUNITY_Community 68|Community 68]]
74
  - [[_COMMUNITY_Community 69|Community 69]]
75
+ - [[_COMMUNITY_Community 70|Community 70]]
76
+ - [[_COMMUNITY_Community 71|Community 71]]
77
 
78
  ## God Nodes (most connected - your core abstractions)
79
  1. `Qwen3TTSSpeakerEncoderConfig` - 49 edges
 
92
  requirements.txt β†’ requirements-omni.txt
93
  - `gradio==6.8.0` --semantically_similar_to--> `gradio==6.12.0 (omni)` [INFERRED] [semantically similar]
94
  requirements.txt β†’ requirements-omni.txt
95
+ - `enforce_content_length_limit()` --calls--> `content_length_middleware()` [INFERRED]
96
+ server.py β†’ app.py
97
  - `run_pipeline()` --calls--> `separate_audio()` [INFERRED]
98
  pipeline.py β†’ steps/s1b_separate.py
99
  - `run_pipeline()` --calls--> `transcribe()` [INFERRED]
 
108
 
109
  ### Community 0 - "Community 0"
110
  Cohesion: 0.04
111
+ Nodes (70): Qwen3TTSConfig, Qwen3TTSSpeakerEncoderConfig, Qwen3TTSTalkerCodePredictorConfig, Qwen3TTSTalkerConfig, r""" This is the configuration class to store the configuration of a [`Qwen3, r""" This is the configuration class to store the configuration of a [`Qwen3, This is the configuration class to store the configuration of a [`Qwen3TTSForCon, r""" This is the configuration class to store the configuration of a [`Qwen3 (+62 more)
112
 
113
  ### Community 1 - "Community 1"
114
  Cohesion: 0.02
115
  Nodes (118): api_run_pipeline(), content_length_middleware(), ZeroGPU-compatible entrypoint using gradio.Server. Server extends FastAPI, so al, Exposed through Gradio's API engine. ZeroGPU will allocate a GPU when this e, run_pipeline(), BaseHTTPMiddleware, BaseModel, _artifact_reaper_loop() (+110 more)
116
 
117
  ### Community 2 - "Community 2"
118
+ Cohesion: 0.04
119
+ Nodes (38): default(), DistributedGroupResidualVectorQuantization, DistributedResidualVectorQuantization, ema_inplace(), EuclideanCodebook, kmeans(), laplace_smoothing(), postprocess_emb() (+30 more)
120
 
121
  ### Community 3 - "Community 3"
122
  Cohesion: 0.05
123
+ Nodes (57): ABC, BasePoster, Abstract base class for platform posters., Save a debug screenshot on failure., BasePoster, _build_system_prompt(), _build_user_prompt(), format_caption() (+49 more)
124
 
125
  ### Community 4 - "Community 4"
126
  Cohesion: 0.06
127
+ Nodes (31): _audio_to_tuple(), _build_choices_and_map(), build_demo(), build_parser(), _collect_gen_kwargs(), _detect_model_kind(), _dtype_from_str(), main() (+23 more)
128
 
129
  ### Community 5 - "Community 5"
130
  Cohesion: 0.06
131
  Nodes (59): post(), _assign_words_to_segments(), _extract_words(), _get_faster_whisper_model(), _get_local_whisper_backend(), _get_openai_whisper_model(), _normalise_segments(), Step 3: Transcribe audio with timestamps. Primary local backend (device-depende (+51 more)
132
 
133
  ### Community 6 - "Community 6"
 
 
 
 
134
  Cohesion: 0.07
135
+ Nodes (50): forward(), generate(), generate_speaker_prompt(), from_pretrained(), _clip_audio(), _ensure_browser_wav(), _filter_preview_segments(), _free_memory() (+42 more)
136
 
137
+ ### Community 7 - "Community 7"
138
  Cohesion: 0.05
139
  Nodes (49): FFmpeg concat list (synced TTS), Try-Now app panel, app.js script ref, Comparison table (HeyGen, Rask, ElevenLabs, Synthesia), Hero section + 23+ languages, Frontend index.html, Source/target language selectors, Pricing tiers (Free/Starter/Creator) (+41 more)
140
 
141
+ ### Community 8 - "Community 8"
142
+ Cohesion: 0.07
143
+ Nodes (35): _collect_output(), _log_step_done(), main(), pipeline.py β€” Core pipeline: CLI entrypoint + importable run_pipeline() for Grad, Print duration + separator line for a completed step., Collect all yields and the return value from the generator., Run the full translation pipeline, yielding progress messages. Args:, run_pipeline() (+27 more)
144
+
145
  ### Community 9 - "Community 9"
146
  Cohesion: 0.09
147
  Nodes (27): $(), clearFile(), createDemoCard(), detectPlatform(), formatBytes(), formatDemoDate(), formatDemoTitle(), getUsedVideos() (+19 more)
148
 
149
  ### Community 10 - "Community 10"
150
+ Cohesion: 0.09
151
+ Nodes (34): Step 4: Translate segment texts using Pollinations chat completions API (OpenAI-, Translate a batch of segments into target_language., _translate_batch(), bedrock_converse(), bedrock_fallback(), build_client(), log_llm_call(), parse_json_array() (+26 more)
152
 
153
  ### Community 11 - "Community 11"
154
  Cohesion: 0.08
 
156
 
157
  ### Community 12 - "Community 12"
158
  Cohesion: 0.1
159
+ Nodes (28): tools_api β€” Standalone endpoints for creator quick tools. Lives alongside the m, audio_cleanup_endpoint(), dramabox_endpoint(), _ext_to_media_type(), APIRouter for /api/tools/* endpoints. Each endpoint is sync request-response (n, Serve a generated artifact. Run dirs auto-expire after RUN_TTL_SECONDS., Manual reap trigger (mostly for testing). Auto-reap runs on a timer., Serve a generated artifact. Run dirs auto-expire after RUN_TTL_SECONDS. (+20 more)
160
 
161
  ### Community 13 - "Community 13"
162
  Cohesion: 0.12
163
  Nodes (27): build_for_job(), ensure_transcription(), extract_audio_hq(), extract_reference_audio(), get_audio_duration(), get_device(), load_chatterbox(), main() (+19 more)
164
 
165
  ### Community 14 - "Community 14"
 
 
 
 
166
  Cohesion: 0.12
167
  Nodes (23): build_t3_cond(), main(), prepare_sample(), prepare_sample.py β€” Turn one dataset.jsonl row into the exact tensors T3.loss(), Build the speaker conditioning (frozen during training)., MTLTokenizer + SOT/EOT padding (mirrors what generate() does internally)., S3Tokenizer on the target dubbed audio β†’ speech tokens (the LABEL). Critica, Turn one dataset row into ready-to-train tensors. (+15 more)
168
 
169
+ ### Community 15 - "Community 15"
170
+ Cohesion: 0.13
171
+ Nodes (26): _compress_silences(), _detect_pauses(), _distribute_padding(), _find_tts_silences(), _generate_silence(), _get_wav_duration(), _pad_silence(), _pause_aware_sync() (+18 more)
172
+
173
  ### Community 16 - "Community 16"
174
  Cohesion: 0.19
175
  Nodes (18): _burn_in(), _clamp(), _extract_audio(), _force_style_for(), _format_timestamp_srt(), _format_timestamp_vtt(), generate_subtitles(), _is_video() (+10 more)
 
191
  Nodes (9): get_fallback_mode(), _get_handler(), get_translation_prompt(), post_translate(), Language-specific handlers for the translation pipeline. Each language that nee, Return a language-specific translation prompt, or the default., Return 'bedrock' or 'google' depending on the language., Run any language-specific post-processing after translation. (+1 more)
192
 
193
  ### Community 21 - "Community 21"
194
+ Cohesion: 0.38
195
+ Nodes (6): _ensure_server(), _generate_impl(), generate_scene(), Dramabox β€” Resemble AI directable speech engine. Single-Space tool: generates a, Lazy-import the Dramabox model + load checkpoints once. Raises a clean Runti, Run Dramabox on `prompt` and write the resulting WAV under `out_dir`. Retur
196
+
197
+ ### Community 22 - "Community 22"
198
+ Cohesion: 0.53
199
+ Nodes (5): main(), _prefetch_chatterbox(), _prefetch_demucs(), _prefetch_faster_whisper(), Prefetch model weights into HF_HOME for faster cold starts on Spaces.
200
+
201
+ ### Community 23 - "Community 23"
202
  Cohesion: 0.33
203
  Nodes (6): app.py validation, pipeline.py simplified, steps/s4_preview.py, steps/s4_tts.py conditional imports, server.py /api/config, TTS_ENGINE env var
204
 
205
+ ### Community 25 - "Community 25"
206
  Cohesion: 1.0
207
  Nodes (2): gradio==6.8.0, gradio==6.12.0 (omni)
208
 
209
+ ### Community 33 - "Community 33"
210
  Cohesion: 1.0
211
  Nodes (1): Load a Qwen3 TTS model and its processor in HuggingFace `from_pretrained` style.
212
 
213
+ ### Community 34 - "Community 34"
214
  Cohesion: 1.0
215
  Nodes (1): Build voice-clone prompt items from reference audio (and optionally reference te
216
 
217
+ ### Community 35 - "Community 35"
218
  Cohesion: 1.0
219
  Nodes (1): Voice clone speech using the Base model. You can provide either:
220
 
221
+ ### Community 36 - "Community 36"
222
  Cohesion: 1.0
223
  Nodes (1): Generate speech with the VoiceDesign model using natural-language style instruct
224
 
225
+ ### Community 37 - "Community 37"
226
  Cohesion: 1.0
227
  Nodes (1): Generate speech with the CustomVoice model using a predefined speaker id, option
228
 
229
+ ### Community 38 - "Community 38"
230
  Cohesion: 1.0
231
  Nodes (1): Delete stale per-job artifact directories from ARTIFACTS_ROOT.
232
 
233
+ ### Community 39 - "Community 39"
234
  Cohesion: 1.0
235
  Nodes (1): Reject oversized uploads before body parsing.
236
 
237
+ ### Community 40 - "Community 40"
238
  Cohesion: 1.0
239
  Nodes (1): Run the translation pipeline in a background thread, pushing progress to the job
240
 
241
+ ### Community 41 - "Community 41"
242
  Cohesion: 1.0
243
  Nodes (1): List whitelisted MP4 demo videos from outputs/ and data/.
244
 
245
+ ### Community 42 - "Community 42"
246
  Cohesion: 1.0
247
  Nodes (1): Return curated showcase entries with resolved streaming URLs.
248
 
249
+ ### Community 43 - "Community 43"
250
  Cohesion: 1.0
251
  Nodes (1): Submit a video for translation.
252
 
253
+ ### Community 44 - "Community 44"
254
  Cohesion: 1.0
255
  Nodes (1): Poll endpoint returning new messages since index `after`, plus live wait status.
256
 
257
+ ### Community 45 - "Community 45"
258
  Cohesion: 1.0
259
  Nodes (1): User selects a TTS model after previewing.
260
 
261
+ ### Community 46 - "Community 46"
262
  Cohesion: 1.0
263
  Nodes (1): Serve a preview audio WAV file.
264
 
265
+ ### Community 47 - "Community 47"
266
  Cohesion: 1.0
267
  Nodes (1): Download the translated video.
268
 
269
+ ### Community 48 - "Community 48"
270
  Cohesion: 1.0
271
  Nodes (1): Create artifact directories and start background cleanup.
272
 
273
+ ### Community 49 - "Community 49"
274
  Cohesion: 1.0
275
  Nodes (1): Sync TTS audio using pause-aware strategy: compress silences first, then atempo.
276
 
277
+ ### Community 50 - "Community 50"
278
  Cohesion: 1.0
279
  Nodes (1): Rewrite WAV with silence regions compressed to keep_ratio of their original dura
280
 
281
+ ### Community 51 - "Community 51"
282
  Cohesion: 1.0
283
  Nodes (1): Insert extra silence distributed across detected pause points.
284
 
285
+ ### Community 52 - "Community 52"
286
  Cohesion: 1.0
287
  Nodes (1): Generate a silent WAV file of given duration.
288
 
289
+ ### Community 53 - "Community 53"
290
  Cohesion: 1.0
291
  Nodes (1): Sync each TTS segment to its original timestamp window and stitch into a single
292
 
293
+ ### Community 54 - "Community 54"
294
  Cohesion: 1.0
295
  Nodes (1): Translate the text of each segment into target_language in batches. Args:
296
 
297
+ ### Community 55 - "Community 55"
298
  Cohesion: 1.0
299
  Nodes (1): Load + run Chatterbox inside a single GPU-decorated scope. ZeroGPU only int
300
 
301
+ ### Community 56 - "Community 56"
302
  Cohesion: 1.0
303
  Nodes (1): Remove trailing noise/artifacts after speech ends.
304
 
305
+ ### Community 57 - "Community 57"
306
  Cohesion: 1.0
307
  Nodes (1): Hard-trim TTS output to orig_dur * headroom, with a short fade-out.
308
 
309
+ ### Community 58 - "Community 58"
310
  Cohesion: 1.0
311
  Nodes (1): Clip audio to max_sec to prevent excessively slow voice cloning.
312
 
313
+ ### Community 59 - "Community 59"
314
  Cohesion: 1.0
315
  Nodes (1): Numpy variant of _trim_trailing_noise for engines returning np.ndarray.
316
 
317
+ ### Community 60 - "Community 60"
318
  Cohesion: 1.0
319
  Nodes (1): Perform full OmniVoice processing (load + generate batch) inside a GPU-decorated
320
 
321
+ ### Community 61 - "Community 61"
322
  Cohesion: 1.0
323
  Nodes (1): Generate speech for all segments using OmniVoice voice cloning.
324
 
325
+ ### Community 62 - "Community 62"
326
  Cohesion: 1.0
327
  Nodes (1): Synthesise translated text for each segment using voice cloned from reference au
328
 
329
+ ### Community 63 - "Community 63"
330
  Cohesion: 1.0
331
  Nodes (1): torch==2.6.0
332
 
333
+ ### Community 64 - "Community 64"
334
  Cohesion: 1.0
335
  Nodes (1): fastapi
336
 
337
+ ### Community 65 - "Community 65"
338
  Cohesion: 1.0
339
  Nodes (1): yt-dlp
340
 
341
+ ### Community 66 - "Community 66"
342
  Cohesion: 1.0
343
  Nodes (1): diffusers==0.29.0
344
 
345
+ ### Community 67 - "Community 67"
346
  Cohesion: 1.0
347
  Nodes (1): ARTIFACTS_ROOT env
348
 
349
+ ### Community 68 - "Community 68"
350
  Cohesion: 1.0
351
  Nodes (1): AWS g4dn.xlarge alternative
352
 
353
+ ### Community 69 - "Community 69"
354
  Cohesion: 1.0
355
  Nodes (1): nodejs (system pkg)
356
 
357
+ ### Community 70 - "Community 70"
358
  Cohesion: 1.0
359
  Nodes (1): fonts-noto-core / cjk
360
 
361
+ ### Community 71 - "Community 71"
362
  Cohesion: 1.0
363
  Nodes (1): graphify project rules
364
 
365
  ## Knowledge Gaps
366
+ - **329 isolated node(s):** `server.py β€” FastAPI backend for VideoVoice. Endpoints: POST /api/jobs`, `Download video from Instagram/YouTube using yt-dlp.`, `Allow only trusted social platforms for yt-dlp.`, `Read media duration from ffprobe.`, `Report CUDA/MPS availability.` (+324 more)
367
  These have ≀1 connection - possible missing edges or undocumented components.
368
+ - **Thin community `Community 25`** (2 nodes): `gradio==6.8.0`, `gradio==6.12.0 (omni)`
369
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
370
+ - **Thin community `Community 33`** (1 nodes): `Load a Qwen3 TTS model and its processor in HuggingFace `from_pretrained` style.`
371
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
372
+ - **Thin community `Community 34`** (1 nodes): `Build voice-clone prompt items from reference audio (and optionally reference te`
373
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
374
+ - **Thin community `Community 35`** (1 nodes): `Voice clone speech using the Base model. You can provide either:`
375
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
376
+ - **Thin community `Community 36`** (1 nodes): `Generate speech with the VoiceDesign model using natural-language style instruct`
377
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
378
+ - **Thin community `Community 37`** (1 nodes): `Generate speech with the CustomVoice model using a predefined speaker id, option`
379
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
380
+ - **Thin community `Community 38`** (1 nodes): `Delete stale per-job artifact directories from ARTIFACTS_ROOT.`
381
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
382
+ - **Thin community `Community 39`** (1 nodes): `Reject oversized uploads before body parsing.`
383
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
384
+ - **Thin community `Community 40`** (1 nodes): `Run the translation pipeline in a background thread, pushing progress to the job`
385
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
386
+ - **Thin community `Community 41`** (1 nodes): `List whitelisted MP4 demo videos from outputs/ and data/.`
387
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
388
+ - **Thin community `Community 42`** (1 nodes): `Return curated showcase entries with resolved streaming URLs.`
389
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
390
+ - **Thin community `Community 43`** (1 nodes): `Submit a video for translation.`
391
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
392
+ - **Thin community `Community 44`** (1 nodes): `Poll endpoint returning new messages since index `after`, plus live wait status.`
393
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
394
+ - **Thin community `Community 45`** (1 nodes): `User selects a TTS model after previewing.`
395
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
396
+ - **Thin community `Community 46`** (1 nodes): `Serve a preview audio WAV file.`
397
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
398
+ - **Thin community `Community 47`** (1 nodes): `Download the translated video.`
399
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
400
+ - **Thin community `Community 48`** (1 nodes): `Create artifact directories and start background cleanup.`
401
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
402
+ - **Thin community `Community 49`** (1 nodes): `Sync TTS audio using pause-aware strategy: compress silences first, then atempo.`
403
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
404
+ - **Thin community `Community 50`** (1 nodes): `Rewrite WAV with silence regions compressed to keep_ratio of their original dura`
405
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
406
+ - **Thin community `Community 51`** (1 nodes): `Insert extra silence distributed across detected pause points.`
407
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
408
+ - **Thin community `Community 52`** (1 nodes): `Generate a silent WAV file of given duration.`
409
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
410
+ - **Thin community `Community 53`** (1 nodes): `Sync each TTS segment to its original timestamp window and stitch into a single`
411
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
412
+ - **Thin community `Community 54`** (1 nodes): `Translate the text of each segment into target_language in batches. Args:`
413
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
414
+ - **Thin community `Community 55`** (1 nodes): `Load + run Chatterbox inside a single GPU-decorated scope. ZeroGPU only int`
415
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
416
+ - **Thin community `Community 56`** (1 nodes): `Remove trailing noise/artifacts after speech ends.`
417
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
418
+ - **Thin community `Community 57`** (1 nodes): `Hard-trim TTS output to orig_dur * headroom, with a short fade-out.`
419
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
420
+ - **Thin community `Community 58`** (1 nodes): `Clip audio to max_sec to prevent excessively slow voice cloning.`
421
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
422
+ - **Thin community `Community 59`** (1 nodes): `Numpy variant of _trim_trailing_noise for engines returning np.ndarray.`
423
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
424
+ - **Thin community `Community 60`** (1 nodes): `Perform full OmniVoice processing (load + generate batch) inside a GPU-decorated`
425
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
426
+ - **Thin community `Community 61`** (1 nodes): `Generate speech for all segments using OmniVoice voice cloning.`
427
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
428
+ - **Thin community `Community 62`** (1 nodes): `Synthesise translated text for each segment using voice cloned from reference au`
429
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
430
+ - **Thin community `Community 63`** (1 nodes): `torch==2.6.0`
431
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
432
+ - **Thin community `Community 64`** (1 nodes): `fastapi`
433
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
434
+ - **Thin community `Community 65`** (1 nodes): `yt-dlp`
435
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
436
+ - **Thin community `Community 66`** (1 nodes): `diffusers==0.29.0`
437
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
438
+ - **Thin community `Community 67`** (1 nodes): `ARTIFACTS_ROOT env`
439
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
440
+ - **Thin community `Community 68`** (1 nodes): `AWS g4dn.xlarge alternative`
441
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
442
+ - **Thin community `Community 69`** (1 nodes): `nodejs (system pkg)`
443
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
444
+ - **Thin community `Community 70`** (1 nodes): `fonts-noto-core / cjk`
445
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
446
+ - **Thin community `Community 71`** (1 nodes): `graphify project rules`
447
  Too small to be a meaningful cluster - may be noise or needs more connections extracted.
448
 
449
  ## Suggested Questions
450
  _Questions this graph is uniquely positioned to answer:_
451
 
452
+ - **Why does `synthesise_segments()` connect `Community 6` to `Community 8`, `Community 11`?**
453
  _High betweenness centrality (0.324) - this node is a cross-community bridge._
454
+ - **Why does `generate()` connect `Community 6` to `Community 0`, `Community 4`?**
455
+ _High betweenness centrality (0.200) - this node is a cross-community bridge._
456
  - **Are the 44 inferred relationships involving `Qwen3TTSSpeakerEncoderConfig` (e.g. with `Res2NetBlock` and `SqueezeExcitationBlock`) actually correct?**
457
  _`Qwen3TTSSpeakerEncoderConfig` has 44 INFERRED edges - model-reasoned connections that need verification._
458
  - **Are the 44 inferred relationships involving `Qwen3TTSTalkerCodePredictorConfig` (e.g. with `Res2NetBlock` and `SqueezeExcitationBlock`) actually correct?**
 
462
  - **Are the 44 inferred relationships involving `Qwen3TTSConfig` (e.g. with `Res2NetBlock` and `SqueezeExcitationBlock`) actually correct?**
463
  _`Qwen3TTSConfig` has 44 INFERRED edges - model-reasoned connections that need verification._
464
  - **What connects `server.py β€” FastAPI backend for VideoVoice. Endpoints: POST /api/jobs`, `Download video from Instagram/YouTube using yt-dlp.`, `Allow only trusted social platforms for yt-dlp.` to the rest of the system?**
465
+ _329 weakly-connected nodes found - possible documentation gaps or missing edges._
graphify-out/graph.html CHANGED
The diff for this file is too large to render. See raw diff
 
server.py CHANGED
@@ -42,8 +42,8 @@ load_dotenv()
42
 
43
  # TTS_ENGINE controls which TTS backend this Space serves
44
  TTS_ENGINE = os.getenv("TTS_ENGINE", "chatterbox").lower()
45
- if TTS_ENGINE not in ("chatterbox", "omnivoice", "qwen3"):
46
- raise ValueError(f"Invalid TTS_ENGINE: {TTS_ENGINE}. Use 'chatterbox', 'omnivoice', or 'qwen3'.")
47
 
48
  # ── Config ────────────────────────────────────────────────
49
  PORT = int(os.getenv("PORT", "7860"))
 
42
 
43
  # TTS_ENGINE controls which TTS backend this Space serves
44
  TTS_ENGINE = os.getenv("TTS_ENGINE", "chatterbox").lower()
45
+ if TTS_ENGINE not in ("chatterbox", "omnivoice", "qwen3", "dramabox"):
46
+ raise ValueError(f"Invalid TTS_ENGINE: {TTS_ENGINE}. Use 'chatterbox', 'omnivoice', 'qwen3', or 'dramabox'.")
47
 
48
  # ── Config ────────────────────────────────────────────────
49
  PORT = int(os.getenv("PORT", "7860"))
tools_api/__init__.py CHANGED
@@ -10,6 +10,7 @@ Endpoints (mounted by router.router):
10
  POST /api/tools/subtitles β€” captions (sidecar or burn-in MP4)
11
  POST /api/tools/voice-clone β€” single-segment TTS with voice clone
12
  POST /api/tools/audio-cleanup β€” Demucs source separation
 
13
  GET /api/tools/file/{run}/{f} β€” download generated artifact
14
  """
15
  from .router import router
 
10
  POST /api/tools/subtitles β€” captions (sidecar or burn-in MP4)
11
  POST /api/tools/voice-clone β€” single-segment TTS with voice clone
12
  POST /api/tools/audio-cleanup β€” Demucs source separation
13
+ POST /api/tools/dramabox β€” Resemble Dramabox directable speech (dramabox Space only)
14
  GET /api/tools/file/{run}/{f} β€” download generated artifact
15
  """
16
  from .router import router
tools_api/dramabox.py ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Dramabox β€” Resemble AI directable speech engine.
3
+
4
+ Single-Space tool: generates a 48 kHz WAV "performance" from a scene prompt
5
+ (quoted dialogue + stage directions) and an optional voice reference. Mirrors
6
+ the official ResembleAI/Dramabox Space's on_generate(): same parameter order,
7
+ same defaults, same model invocation.
8
+
9
+ This module only runs on the videovoice-dramabox Space, which must vendor the
10
+ Dramabox `src/` directory (inference_server.py + model_downloader.py) and the
11
+ requirements-dramabox.txt deps. On any other Space the lazy import below
12
+ raises a clean RuntimeError rather than crashing app startup.
13
+
14
+ The module loads the TTSServer once on first request (warm-load pattern from
15
+ the upstream Space) and reuses it across calls.
16
+ """
17
+ from __future__ import annotations
18
+
19
+ import logging
20
+ import os
21
+ import threading
22
+ import time
23
+ from pathlib import Path
24
+
25
+ # Backend env knobs β€” kept compatible with the upstream Space.
26
+ _LTX_DTYPE = os.environ.get("LTX_DTYPE", "bf16")
27
+
28
+ # Module-level warm load, guarded by a lock so a flurry of concurrent first
29
+ # requests only triggers one load. Subsequent calls are ~2.5s on warm GPU.
30
+ _tts_lock = threading.Lock()
31
+ _tts_server = None # populated lazily on first generate() call
32
+
33
+ logger = logging.getLogger("tools_api.dramabox")
34
+
35
+
36
+ def _ensure_server():
37
+ """Lazy-import the Dramabox model + load checkpoints once. Raises a clean
38
+ RuntimeError on Spaces that don't ship the Dramabox `src/` vendoring.
39
+ """
40
+ global _tts_server
41
+ if _tts_server is not None:
42
+ return _tts_server
43
+
44
+ with _tts_lock:
45
+ if _tts_server is not None:
46
+ return _tts_server
47
+
48
+ try:
49
+ # Vendored from ResembleAI/Dramabox; the Space's `src/` must be on
50
+ # sys.path. We add it here so this module doesn't require app.py
51
+ # to do the insert itself.
52
+ import sys
53
+ vendored_src = Path(__file__).parent.parent / "dramabox_src"
54
+ if vendored_src.exists() and str(vendored_src) not in sys.path:
55
+ sys.path.insert(0, str(vendored_src))
56
+ from inference_server import TTSServer # type: ignore[import-not-found]
57
+ from model_downloader import get_all_paths # type: ignore[import-not-found]
58
+ except ImportError as e:
59
+ raise RuntimeError(
60
+ "Dramabox is not installed on this Space. Vendor "
61
+ "ResembleAI/Dramabox's src/ directory at "
62
+ "VideoVoice-be/dramabox_src/ and install requirements-dramabox.txt."
63
+ ) from e
64
+
65
+ logger.info("Fetching Dramabox checkpoints (cached after first run)...")
66
+ paths = get_all_paths()
67
+
68
+ logger.info("Loading Dramabox warm server (Gemma + DiT + VAE + Decoder)...")
69
+ _tts_server = TTSServer(
70
+ checkpoint=paths["transformer"],
71
+ full_checkpoint=paths["audio_components"],
72
+ gemma_root=paths["gemma_root"],
73
+ device="cuda",
74
+ dtype=_LTX_DTYPE,
75
+ compile_model=False, # torch.compile breaks under ZeroGPU's brief GPU windows
76
+ bnb_4bit=True, # unsloth Gemma is pre-quantized
77
+ )
78
+ logger.info("Dramabox TTSServer ready.")
79
+ return _tts_server
80
+
81
+
82
+ def generate_scene(
83
+ *,
84
+ prompt: str,
85
+ out_dir: Path,
86
+ audio_ref: Path | None = None,
87
+ cfg: float = 2.5,
88
+ stg: float = 1.5,
89
+ dur_mult: float = 1.1,
90
+ gen_dur: float = 0.0,
91
+ ref_dur: float = 10.0,
92
+ seed: int = 42,
93
+ ) -> dict:
94
+ """
95
+ Run Dramabox on `prompt` and write the resulting WAV under `out_dir`.
96
+
97
+ Returns:
98
+ {
99
+ "filename": "dramabox_<run_id_short>.wav",
100
+ "elapsed": <seconds>,
101
+ "settings": {...echo of inputs used...},
102
+ }
103
+ """
104
+ prompt = (prompt or "").strip()
105
+ if not prompt:
106
+ raise ValueError("Prompt is empty.")
107
+
108
+ # Try to GPU-decorate at call time if `spaces` is available. On the
109
+ # ZeroGPU Space this maps weights onto the GPU for the duration of the
110
+ # call; on local dev (no `spaces`) it's a no-op pass-through.
111
+ try:
112
+ import spaces # type: ignore[import-not-found]
113
+
114
+ @spaces.GPU(duration=60)
115
+ def _run():
116
+ return _generate_impl(
117
+ prompt=prompt,
118
+ out_dir=out_dir,
119
+ audio_ref=audio_ref,
120
+ cfg=cfg, stg=stg, dur_mult=dur_mult,
121
+ gen_dur=gen_dur, ref_dur=ref_dur, seed=seed,
122
+ )
123
+ return _run()
124
+ except ImportError:
125
+ return _generate_impl(
126
+ prompt=prompt,
127
+ out_dir=out_dir,
128
+ audio_ref=audio_ref,
129
+ cfg=cfg, stg=stg, dur_mult=dur_mult,
130
+ gen_dur=gen_dur, ref_dur=ref_dur, seed=seed,
131
+ )
132
+
133
+
134
+ def _generate_impl(
135
+ *,
136
+ prompt: str,
137
+ out_dir: Path,
138
+ audio_ref: Path | None,
139
+ cfg: float,
140
+ stg: float,
141
+ dur_mult: float,
142
+ gen_dur: float,
143
+ ref_dur: float,
144
+ seed: int,
145
+ ) -> dict:
146
+ tts = _ensure_server()
147
+ out_dir.mkdir(parents=True, exist_ok=True)
148
+ output = out_dir / f"dramabox_{int(time.time() * 1000)}.wav"
149
+
150
+ ref_path: str | None = None
151
+ if audio_ref is not None and Path(audio_ref).exists():
152
+ ref_path = str(audio_ref)
153
+
154
+ t0 = time.time()
155
+ tts.generate_to_file(
156
+ prompt=prompt,
157
+ output=str(output),
158
+ voice_ref=ref_path,
159
+ cfg_scale=float(cfg),
160
+ stg_scale=float(stg),
161
+ duration_multiplier=float(dur_mult),
162
+ seed=int(seed),
163
+ gen_duration=float(gen_dur),
164
+ ref_duration=float(ref_dur),
165
+ )
166
+ elapsed = time.time() - t0
167
+ logger.info(f"Dramabox generated in {elapsed:.2f}s -> {output}")
168
+
169
+ return {
170
+ "filename": output.name,
171
+ "elapsed": elapsed,
172
+ "settings": {
173
+ "cfg": cfg,
174
+ "stg": stg,
175
+ "dur_mult": dur_mult,
176
+ "gen_dur": gen_dur,
177
+ "ref_dur": ref_dur,
178
+ "seed": seed,
179
+ "had_voice_ref": ref_path is not None,
180
+ },
181
+ }
tools_api/router.py CHANGED
@@ -16,7 +16,7 @@ from fastapi.responses import FileResponse, JSONResponse, PlainTextResponse
16
 
17
  from server import limiter, _download_url, _is_allowed_video_host
18
 
19
- from . import audio_cleanup, subtitles, voice_clone
20
  from .storage import (
21
  file_url,
22
  new_run_dir,
@@ -178,6 +178,76 @@ async def voice_clone_endpoint(
178
  })
179
 
180
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
  # ── Audio cleanup ────────────────────────────────────────────────────
182
 
183
  @router.post("/audio-cleanup")
 
16
 
17
  from server import limiter, _download_url, _is_allowed_video_host
18
 
19
+ from . import audio_cleanup, dramabox, subtitles, voice_clone
20
  from .storage import (
21
  file_url,
22
  new_run_dir,
 
178
  })
179
 
180
 
181
+ # ── Dramabox ─────────────────────────────────────────────────────────
182
+
183
+ @router.post("/dramabox")
184
+ @limiter.limit("10/hour")
185
+ async def dramabox_endpoint(
186
+ request: Request,
187
+ prompt: str = Form(...),
188
+ audio_ref: Optional[UploadFile] = File(None),
189
+ cfg: float = Form(2.5),
190
+ stg: float = Form(1.5),
191
+ dur_mult: float = Form(1.1),
192
+ gen_dur: float = Form(0.0),
193
+ ref_dur: float = Form(10.0),
194
+ seed: int = Form(42),
195
+ ):
196
+ prompt = (prompt or "").strip()
197
+ if not prompt:
198
+ raise HTTPException(400, "prompt is required")
199
+ if len(prompt) > 2000:
200
+ raise HTTPException(400, "prompt exceeds 2000 char limit")
201
+
202
+ # Range guards mirror the upstream Dramabox sliders.
203
+ if not (1.0 <= cfg <= 10.0):
204
+ raise HTTPException(400, "cfg must be between 1 and 10")
205
+ if not (0.0 <= stg <= 5.0):
206
+ raise HTTPException(400, "stg must be between 0 and 5")
207
+ if not (0.8 <= dur_mult <= 2.0):
208
+ raise HTTPException(400, "dur_mult must be between 0.8 and 2.0")
209
+ if not (0.0 <= gen_dur <= 60.0):
210
+ raise HTTPException(400, "gen_dur must be between 0 and 60")
211
+ if not (3.0 <= ref_dur <= 30.0):
212
+ raise HTTPException(400, "ref_dur must be between 3 and 30")
213
+
214
+ run_id, dest_dir = new_run_dir()
215
+
216
+ ref_path: Optional[Path] = None
217
+ if audio_ref is not None and audio_ref.filename:
218
+ ref_path = await _save_upload(audio_ref, dest_dir, "voice_ref.wav")
219
+
220
+ try:
221
+ info = await asyncio.to_thread(
222
+ dramabox.generate_scene,
223
+ prompt=prompt,
224
+ out_dir=dest_dir,
225
+ audio_ref=ref_path,
226
+ cfg=cfg,
227
+ stg=stg,
228
+ dur_mult=dur_mult,
229
+ gen_dur=gen_dur,
230
+ ref_dur=ref_dur,
231
+ seed=seed,
232
+ )
233
+ except ValueError as e:
234
+ raise HTTPException(400, str(e))
235
+ except RuntimeError as e:
236
+ # Raised by dramabox._ensure_server() on Spaces that don't ship the
237
+ # vendored model. Surface clearly so the frontend can fall back.
238
+ raise HTTPException(503, str(e))
239
+ except Exception as e: # noqa: BLE001
240
+ raise HTTPException(500, f"Dramabox generation failed: {e}")
241
+
242
+ return JSONResponse({
243
+ "run_id": run_id,
244
+ "filename": info["filename"],
245
+ "url": file_url(run_id, info["filename"]),
246
+ "elapsed": info["elapsed"],
247
+ "settings": info["settings"],
248
+ })
249
+
250
+
251
  # ── Audio cleanup ────────────────────────────────────────────────────
252
 
253
  @router.post("/audio-cleanup")