Workflow : V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video

#52

by RuneXX - opened 28 days ago

Workflow: V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video*
Add voice and sounds to your silent videos with lip-sync.

It has a few setting tweaks to play around with, such as facemask vs no facemask (how strict to adhere to the input video), as well as how strong influence the end of video should have. These settings will determine how much freedom the model has to change things. Too strict can look a bit unnatural.

Plus an extra feature of being able to also extend your silent video, since most such (from Wan etc) are probably short clips.

A little bit experimental, so might come updates to the workflow.. .but something to play around with for now ;-)

RuneXX changed discussion title from Workflow : V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video* to Workflow : V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video 28 days ago

RuneXX

Owner 28 days ago

•

edited 28 days ago

With extended video (optional part of the workflow)

sundeveloper

27 days ago

•

edited 27 days ago

Is it possible to make it so that there are no changes except to the masked part?

RuneXX

Owner 27 days ago

Is it possible to make it so that there are no changes except to the masked part?

Should be with a bit of masking. The mask in the above workflow is made a bit weak to ensure lip-sync, but with a proper inpaint like masking it should be doable ;-)

Jehex

27 days ago

Nice, for the foley / sound generation ( v2v ) is they're a way to simply connect the audio generated to the video combine node instead of creating a new video from the input one

RuneXX

Owner 27 days ago

•

edited 27 days ago

Is it possible to make it so that there are no changes except to the masked part?

A little inpainting test.. seems to work. Will try find some sweet spot for details etc.

Prompt: "blue eyes and glasses" ;-) with mask around the eyes area. Not 100% just the masked area, but close (the timing is a little different in the example above, but thats my fault. One video was 24fps, other 25fps)

RuneXX

Owner 27 days ago

•

edited 27 days ago

Nice, for the foley / sound generation ( v2v ) is they're a way to simply connect the audio generated to the video combine node instead of creating a new video from the input one

Thats what it already does (the foley workflow). It does generate a video (since its a video model), but the video part is disregarded at the end, only the audio is used
(except if you also extend the video, then the new added video parts is also from LTX)

HokshaBald

about 15 hours ago

This workflow is almost exactly what I needed. I am testing it for my video, however I noticed a lora loader of ANIMTEDDIFF\v3_sd15_adapter.ckpt. Is it required? Is it the model below?
https://huggingface.co/guoyww/animatediff/blob/main/v3_sd15_adapter.ckpt
Where should I put it in comfyui, shall I put it in the Loras folder?

HokshaBald

about 14 hours ago

I am sorry but where can I input the audio?
I can see the load video node which I can replace with my video, but for audio it seems to use empty audio latent rather than loading the voice audio, I am sure I am missing something. Any help is greatly appreciated.

Thanks for advance.

RuneXX

Owner about 11 hours ago

I noticed a lora loader of ANIMTEDDIFF\v3_sd15_adapter.ckpt. Is it required? Is it the model below?

No thats just a "placeholder" one by accident. I made a secondary lora loader in this workflow where the audio part is muted (for user made loras not trained on audio data).
It should be none loaded in that unless you want to load some lora. I'll take a look at that workflow see if i can set it to "off" as default.
For now just select it and click CTRL+B to bypass it

RuneXX

Owner about 11 hours ago

I am sorry but where can I input the audio?

This particular workflow you just prompt. For what audio you want to have
Since I am already updating it for the lora loader (see post above), I might add an optional custom audio input as well ;-)

HokshaBald

about 11 hours ago

Since I am already updating it for the lora loader (see post above), I might add an optional custom audio input as well ;-)

This is going to be cool. I am looking forward to it.

RuneXX

Owner about 10 hours ago

•

edited about 10 hours ago

PROMPT GENERATED VOICE :

CUSTOM AUDIO - where you can use your own audio file as input (in the demo a voice audio mp3 generated in Pocket TTS) :

UPDATED WORKFLOWS

A new version where you can use custom audio file as the audio input to lip-sync to
Updated prompt to speak version removing a confusing lora loader

And the workflow seems to work even better with the v.1.1 distilled model ;-) but only did a few runs to create example videos

(I'll update the Sam3 version too - where you just prompt what to mask - but will wait for Sam3 to be natively supported in ComfyUI, it will be real real soon)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment