Image support?

by debug356 - opened Mar 15

Mar 15

Great abliteration It's the only one that actually works well with NSFW prompting in the "TextGenerateLTX2Prompt" node — all others I tested either refuse or produce garbage.
the image input in the TextGenerateLTX2Prompt node doesn't work.
Would be awesome if that could be supported

Manderlore

26 days ago

Same here, I have tried all I could find and while many work to generate a video they are all completely worthless when you use them in the prompt enhancer/generation node (Including sikaworld-high-fidelity-edition-Ltx-2 ) But somehow this one works really well, so awesome job!
But when you insert a image you get a long error about clip missing so the vision_model and/or multi_modal_projector keys are not included or something. So you need to edit the prompt yourself after to change everything it made up when it can not see the image. So yes that would be a nice addition!

Sikaworld1990

Owner 22 days ago

Same here, I have tried all I could find and while many work to generate a video they are all completely worthless when you use them in the prompt enhancer/generation node (Including sikaworld-high-fidelity-edition-Ltx-2 ) But somehow this one works really well, so awesome job!
But when you insert a image you get a long error about clip missing so the vision_model and/or multi_modal_projector keys are not included or something. So you need to edit the prompt yourself after to change everything it made up when it can not see the image. So yes that would be a nice addition!

Can u send me the workflow or a screenshot of the node(s) causing the problem?

Manderlore

22 days ago

Can u send me the workflow or a screenshot of the node(s) causing the problem?

The first error you get from the log is the first time you load the Text Encoder. The log (cut out the middle part as it is long and just repeats for all layers):

"clip missing: ['multi_modal_projector.mm_input_projection_weight', 'multi_modal_projector.mm_soft_emb_norm.weight', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.embeddings.patch_embedding.bias', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.1.layer_norm1.weight',
...
...
...
'vision_model.encoder.layers.26.layer_norm1.weight', 'vision_model.encoder.layers.26.layer_norm1.bias', 'vision_model.encoder.layers.26.self_attn.q_proj.weight', 'vision_model.encoder.layers.26.self_attn.q_proj.bias', 'vision_model.encoder.layers.26.self_attn.k_proj.weight', 'vision_model.encoder.layers.26.self_attn.k_proj.bias', 'vision_model.encoder.layers.26.self_attn.v_proj.weight', 'vision_model.encoder.layers.26.self_attn.v_proj.bias', 'vision_model.encoder.layers.26.self_attn.out_proj.weight', 'vision_model.encoder.layers.26.self_attn.out_proj.bias', 'vision_model.encoder.layers.26.layer_norm2.weight', 'vision_model.encoder.layers.26.layer_norm2.bias', 'vision_model.encoder.layers.26.mlp.fc1.weight', 'vision_model.encoder.layers.26.mlp.fc1.bias', 'vision_model.encoder.layers.26.mlp.fc2.weight', 'vision_model.encoder.layers.26.mlp.fc2.bias', 'vision_model.post_layernorm.weight', 'vision_model.post_layernorm.bias']
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16"

You can just ignore that as generation works anyway, but the thing that does not work is the TextGenerateLTX2Prompt node below which is a Comfy Core node. The default LTX-2.3 Image to Video workflow that is inside Templates in ComfyUI or here: https://comfy.org/workflows/video_ltx2_3_i2v-7cc1d3bd2802/ has that node (It is at the top of the only Subgraph). And when you input an image you get a error saying “NotImplementedError: Cannot copy out of meta tensor; no data! “

And the full log after using an image input:

“D:\ComfyUI\custom_nodes\ComfyUI-Dev-Utils\nodes\execution_time.py:49: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
torch.cuda.reset_max_memory_allocated(device)
Requested to load Gemma3_12BModel_
Model Gemma3_12BModel_ prepared for dynamic VRAM loading. 23235MB Staged. 0 patches attached. Force pre-loaded 290 weights: 1497 KB.
!!! Exception during processing !!! Cannot copy out of meta tensor; no data!
Traceback (most recent call last):
File "D:\ComfyUI\execution.py", line 525, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\execution.py", line 334, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\custom_nodes\comfyui-lora-manager\py\metadata_collector\metadata_hook.py", line 168, in async_map_node_over_list_with_metadata
results = await original_map_node_over_list(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\execution.py", line 308, in async_map_node_over_list
await process_inputs(input_dict, i)
File "D:\ComfyUI\execution.py", line 296, in process_inputs
result = f(**inputs)
^^^^^^^^^^^
File "D:\ComfyUI\comfy_api\internal_init.py", line 149, in wrapped_func
return method(locked_class, **inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy_api\latest_io.py", line 1764, in EXECUTE_NORMALIZED
to_return = cls.execute(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy_extras\nodes_textgen.py", line 164, in execute
return super().execute(clip, formatted_prompt, max_length, sampling_mode, image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy_extras\nodes_textgen.py", line 56, in execute
generated_ids = clip.generate(
^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy\sd.py", line 434, in generate
return self.cond_stage_model.generate(tokens, do_sample=do_sample, max_length=max_length, temperature=temperature, top_k=top_k, top_p=top_p, min_p=min_p, repetition_penalty=repetition_penalty, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy\text_encoders\lt.py", line 96, in generate
embeds, _, _, embeds_info = self.process_tokens(tokens_only, self.execution_device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy\sd1_clip.py", line 228, in process_tokens
emb, extra = self.transformer.preprocess_embed(emb, device=device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy\text_encoders\llama.py", line 1100, in preprocess_embed
return self.multi_modal_projector(self.vision_model(image.to(device, dtype=torch.float32))[0]), None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy\clip_model.py", line 290, in forward
x = self.embeddings(pixel_values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 1790, in call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy\clip_model.py", line 263, in forward
return embeds + comfy.ops.cast_to_input(self.position_embedding.weight, embeds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy\ops.py", line 79, in cast_to_input
return comfy.model_management.cast_to(weight, input.dtype, input.device, non_blocking=non_blocking, copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\comfy\model_management.py", line 1314, in cast_to
r.copy(weight, non_blocking=non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!”

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment