Workflow : V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video
Workflow: V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video*
Add voice and sounds to your silent videos with lip-sync.
It has a few setting tweaks to play around with, such as facemask vs no facemask (how strict to adhere to the input video), as well as how strong influence the end of video should have. These settings will determine how much freedom the model has to change things. Too strict can look a bit unnatural.
Plus an extra feature of being able to also extend your silent video, since most such (from Wan etc) are probably short clips.
A little bit experimental, so might come updates to the workflow.. .but something to play around with for now ;-)
With extended video (optional part of the workflow)
Is it possible to make it so that there are no changes except to the masked part?
Is it possible to make it so that there are no changes except to the masked part?
Should be with a bit of masking. The mask in the above workflow is made a bit weak to ensure lip-sync, but with a proper inpaint like masking it should be doable ;-)
Nice, for the foley / sound generation ( v2v ) is they're a way to simply connect the audio generated to the video combine node instead of creating a new video from the input one
Is it possible to make it so that there are no changes except to the masked part?
A little inpainting test.. seems to work. Will try find some sweet spot for details etc.
Prompt: "blue eyes and glasses" ;-) with mask around the eyes area. Not 100% just the masked area, but close (the timing is a little different in the example above, but thats my fault. One video was 24fps, other 25fps)
Nice, for the foley / sound generation ( v2v ) is they're a way to simply connect the audio generated to the video combine node instead of creating a new video from the input one
Thats what it already does (the foley workflow). It does generate a video (since its a video model), but the video part is disregarded at the end, only the audio is used
(except if you also extend the video, then the new added video parts is also from LTX)
This workflow is almost exactly what I needed. I am testing it for my video, however I noticed a lora loader of ANIMTEDDIFF\v3_sd15_adapter.ckpt. Is it required? Is it the model below?
https://huggingface.co/guoyww/animatediff/blob/main/v3_sd15_adapter.ckpt
Where should I put it in comfyui, shall I put it in the Loras folder?
I am sorry but where can I input the audio?
I can see the load video node which I can replace with my video, but for audio it seems to use empty audio latent rather than loading the voice audio, I am sure I am missing something. Any help is greatly appreciated.
Thanks for advance.
I noticed a lora loader of ANIMTEDDIFF\v3_sd15_adapter.ckpt. Is it required? Is it the model below?
No thats just a "placeholder" one by accident. I made a secondary lora loader in this workflow where the audio part is muted (for user made loras not trained on audio data).
It should be none loaded in that unless you want to load some lora. I'll take a look at that workflow see if i can set it to "off" as default.
For now just select it and click CTRL+B to bypass it
I am sorry but where can I input the audio?
This particular workflow you just prompt. For what audio you want to have
Since I am already updating it for the lora loader (see post above), I might add an optional custom audio input as well ;-)
Since I am already updating it for the lora loader (see post above), I might add an optional custom audio input as well ;-)
This is going to be cool. I am looking forward to it.
PROMPT GENERATED VOICE :
CUSTOM AUDIO - where you can use your own audio file as input (in the demo a voice audio mp3 generated in Pocket TTS) :
UPDATED WORKFLOWS
- A new version where you can use custom audio file as the audio input to lip-sync to
- Updated prompt to speak version removing a confusing lora loader
And the workflow seems to work even better with the v.1.1 distilled model ;-) but only did a few runs to create example videos
(I'll update the Sam3 version too - where you just prompt what to mask - but will wait for Sam3 to be natively supported in ComfyUI, it will be real real soon)