Allow video input (not just image) for initializing long video + audio workflows

#95
by culinarytracker - opened

Love the LTX 2.3 workflows. One pain point with long-form generation:

Within a single run, segment-to-segment continuity is great, the overlap motion frames keep everything smooth across 5–8 loops (at least for my ram and vram situation). But when the target length exceeds what one run can handle, I have to start a new run from a single image. That hard reset from motion frames to a static image creates an obvious seam every time. A bouncing dancing character can instantly switch to being still, etc...

It would be a huge improvement if the workflow could accept a video clip as the starting input and extract the last x number of motion frames from it, using the same overlap logic that already works between segments, but applied at the starting level.

This would make it possible to chain multiple runs together seamlessly for longer videos without those jarring transition points. Is that something that could be added?

That might be a good idea actually.
Already have extend any video workflows, so that should be possible.
Extend first, and then long-form all in one.

Will try a workflow for that ;-)

(the long generation workflow i been meaning to tweak a bit, but its a bit challenging with the 2pass since 2 samplers end up changing colors and fidelity and more so the seams are a bit more obvious. 1 pass works great though, but is a bit more heavy to run, at least on my pc)

Noticed the LTXVPreprocess img_compression is set to 33% in those extend groups. Thought it was very agressive and blurred the extend to the point it was noticeable in the transitions from cut 1 to subsequent cut. But DUUUUDE... this workflow coupled with Ace 1.5 makes me do crazy song with fake bands for my 5yo daughter ( yeah but mostly for me lol) Did a 60 seconds APP MODE with 3x20 seconds, I am tweaking and testing some settings to keep the 5 extend group. So much fun, tweaking the cfg's so it adhere more to audio is also nice. Keep the legendary work, I learned so much working on understanding your structure, LOVE LOVE IT!
https://huggingface.co/WanApp/LtxApp360/resolve/main/Ltx360_FinalCut_0000-audio.mp4 I think the lip sync and adherence is crazy! You are a wizard!

33% is for LTX-2.0 ;-)
So that might been a left over from those days ;-) will update to 18% (18% is what LTX recommended. But it works fine with lower)

And nice rock video ;-)

Might add a lyrics one, that shows lyrics on screen. But the nodes that i tried so far, where all a bit meh.. complicated to get it work with correct timestamps

33% is for LTX-2.0 ;-)
So that might been a left over from those days ;-) will update to 18% (18% is what LTX recommended. But it works fine with lower)

And nice rock video ;-)

You work too hard :)

Already have extend any video workflows, so that should be possible.
Extend first, and then long-form all in one.

Will try a workflow for that ;-)

That would be awesome, the current Extend Any Video workflow doesn't have custom audio if I remember correctly...

Honestly, most generations hit memory caps for me eventually, so I've been grabbing the files from the Temp directory and stitching them together myself... But I'm usually just 15-30 seconds short of the whole song or whatever. So that's when I have to take the last frame (or a nearby clear frame) and hope the outcome looks ok. But normally I end up with 5-7 segments that flow perfectly, and then 1-2 at the end that just seem off. I have 48Gb of ram, but ColorMatch or something usually breaks it....

I think the approach with the new music video app to generate different scene from different Angles of the same original scene might be a nice angle of attack ( pun intended) also involves using a separate workflow, BUT I LOVE the idea. I am making a Angles Apps with the old Qwen one-click template from comfyui to help people with that (and myself) and I will probably make a Comfyui Models friendly version of the Music Video workflow too. ( I must admit, as a huge music lover, spare time drummer, former music video editor for a Metal band, this is EXACTLY why I love generating video, so HUGE THANKS AGAIN for these awesome workflow.

Honestly, most generations hit memory caps for me eventually, so I've been grabbing the files from the Temp directory and stitching them together myself...

Each of the groups (or window as others might call it) have image and audio output. So you can connect a video combine node to each group, and save each part more easily. That will output the video for that specific group only. And can be used to manually stitch them together instead of the workflow doing it (that can be quite a heavy ram use at the very end)

(you can remove or bypass the "accumulated frames so far" video combine if so, its the one that might be heavy on the RAM as the video gets longer and longer)

image

So that's when I have to take the last frame (or a nearby clear frame) and hope the outcome looks ok. But normally I end up with 5-7 segments that flow perfectly, and then 1-2 at the end that just seem off.

Will try a "continue" workflow (makes sense to have, since most songs are longer than what you create at first run with the workflow).
Should be doable ;-) and useful

I think the approach with the new music video app to generate different scene from different Angles of the same original scene might be a nice angle of attack

Yes exactly. Thats the whole concept really. Its a multi-scene workflow where you have a "storyboard" of sorts by generating multiple images to use in the video.
Qwen works great for that. It has a Next Scene lora as well as Multiple Angles lora, that helps creating different views of same scene much easier

not that YOU need it, but I made this App Mode with your workflow in mind, I am also working on making my dumbed down version in app mode, im calling it LtxMTV ;-)
https://huggingface.co/WanApp/AnglesApp

Just thought about the fact that batching all those images at the end... I'll make a separate very simple app mode to stich them Rebatch/concat them together AFTER in post prod. ( not everyone has a hundred video tool and why not do it in comfy )

Low RAM version - Workflow

Uploaded a low ram/vram version that does not batch images and accumulate a long video on the fly.
But rather the workflow saves each video part in a temporary folder and each group continue fresh without using previous frames (only the frame count, a number value)

At the very end the saved video parts automatically generate a final video.
This should be far lighter on the computer. And as light as generating just a single short video in any other workflow.

And should you be really really low on ram and the last load all video parts + save as one video be too heavy, all the parts are in the temporary folder, so you can use any other tools to stitch them together.

Sign up or log in to comment