When video and audio concat don't align

#96

by Ian0707 - opened 5 days ago

When I used the LTX-2.3_-_I2V_T2V_Music-Video-Creator_multi-scene_custom_audio.json to generate a video, where SamplerCustomAdvanced threw this error:

RuntimeError: Expected all tensors to be on the same device, but got tensors is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA_cat)

How can I fixed that promblem?

RuneXX

Owner 5 days ago

Thats a strange one.
Try update comfyui and KJ Nodes

Ian0707

4 days ago

It still not work.
I suppose that is MelBandRoFormer out the np.array type.
Did you run into this problem?

RuneXX

Owner 4 days ago

Not sure what this error is.. it says the model on the cpu different than the model at the vram.
But where that comes from not sure.

Since you mention MelbandRoFormer, does it work if you disable that?

Ian0707

4 days ago

Yes, I disabled all the MelbandRoFormer nodes and related ones. It works now, but the generated video looks foggy.
Note: This error usually occurs when the video tensor is on the GPU and the voice tensor is on the CPU, and they are being concatenated.

RuneXX

Owner 4 days ago

•

edited 3 days ago

hmmm MelbandRoformer is an audio separator (extracts vocals). It should have zero impact on the video result (other than the lip-sync)
Try double check your vae loaded etc that all the models are correct.

But just to be sure i'll double check here also if there is anything

Note: This error usually occurs when the video tensor is on the GPU and the voice tensor is on the CPU, and they are being concatenated.

Might be something to ask Kijai about. He made that node, but never had any issues myself

WanApp

3 days ago

Question: Any reason using MelbandRoformer for this task, is it better than audio-separation-node and Deepxtractv2, matter of vram usage? I am curious because I make my drumless track with both mentionned but neevr tried to vocal extract since I drums :)

RuneXX

Owner 3 days ago

•

edited 3 days ago

Many audio separator tools out there by now. You can swap out for other if you prefer.
The MelbanRoformer is really good though, at vocal extraction.

(the only reason for using it is to improve lip-sync if the audio input is "muffled". But for many cases you want to music too, for example playing guitar or dance)

And its only what LTX "hears", the final output has full audio, even if you feed vocals only to LTX

WanApp

3 days ago

Yeah, I got the part why earlier and thought that was pretty smart :)this is why I love your workflows, smart.

WanApp

3 days ago

https://huggingface.co/WanApp/LtxMTV/resolve/main/LtxMTV.json

Work in progress ( I just upped this fast because I was happy with my take on your MTV concept. The workflow is a mess, I did not take time to rename or even take decent angles, just wanted to test before bed :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment