Kijai/LTX2.3_comfy · LTX-2.3 version 1.1 of distilled model

•

@Kijai
LTX released a new v1.1 of their distilled model and the distilled lora.
https://huggingface.co/Lightricks/LTX-2.3/tree/main

Not sure what was fixed but perhaps worth adding to the repro here ;-)

"The distilled v1.1 version of the full model - A different aesthetic experience and improved audio compared to v1.0"

RuneXX

6 days ago

Ah .. I see you are already converting ;-)

(all the IC-Loras from LTX also been updated to be compatible with the new distilled version.. so i guess the v.1.1 is a bit of a change)

Hearcharted

6 days ago

@RuneXX Yeah @Kijai is faster than the speed of light 🚀

anr2me

6 days ago

No update for fp8_input_scaled? 🤔

RuneXX

6 days ago

•

edited 6 days ago

@RuneXX Yeah @Kijai is faster than the speed of light 🚀

Just a quick test run of the distilled v.1.1 fp8 version from Kijai ;-)
(low-res video since it was just a quick test run)

Didnt do any 1 to 1 comparison versus version 1.0, but from first test run, the sound seems much improved.... seems very clear and natural and not so "digitial" (only done first run though)

And thanks a ton Kijai ;-)

Hearcharted

6 days ago

Black Adam ⚡ 😅

kpu3uc

5 days ago

ltx-2.3-22b-distilled-1.1_transformer_only_bf16.safetensors when?

Kijai

Owner 5 days ago

No update for fp8_input_scaled? 🤔

Input scaling needs calibration so it's more time consuming to do, the fp8_scaled 1.1 is bit different to the previous ones and has many layers using the fp8 matmuls, so it should still be fast on supported hardware. Calibrated input scaling could potentially increase quality though.

JustLoveBen

5 days ago

Will the VAE model be updated?

Kijai

Owner 5 days ago

ltx-2.3-22b-distilled-1.1_transformer_only_bf16.safetensors when?

Uploaded.

Kijai

Owner 5 days ago

Will the VAE model be updated?

I don't think there's been a VAE update? The 1.1 was only about the distillation.

soxon

5 days ago

In a multi-image reference workflow, the face changes and no longer matches the face in the original images when using ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.

RuneXX

5 days ago

•

edited 5 days ago

In a multi-image reference workflow, the face changes and no longer matches

Multi frame ? or some lora? I did notice some loras might need an update (like IC-Lora-Cameraman etc. Didnt seem to like v1.1)
A quick little first last frame image input:
(low res, but seemed to work fine)

anr2me

5 days ago

I did notice some loras might need an update (like IC-Lora-Cameraman etc. Didnt seem to like v1.1)

They also re-uploaded the official IC-LoRAs to support distilled v1.1 (based on the commit history):

LTX-2.3-22b-IC-LoRA-Union-Control

Commit History

Model: Upload updated version compatible with the v1.1 distilled model

May be other(third-party) loras also need to be updated 🤔 not sure what's the difference with the re-uploaded IC-LoRAs, since they use the same file name.

InsektoInc

5 days ago

Just wanted to say that I've been testing "ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32" compared to "ltx-2.3-22b-distilled_transformer_only_fp8_input_scaled_v3" and I'm getting way better results with similar render times. Thank you Kijai and RuneXX for all of your hard work!

soxon

5 days ago

In a multi-image reference workflow, the face changes and no longer matches

Multi frame ? or some lora? I did notice some loras might need an update (like IC-Lora-Cameraman etc. Didnt seem to like v1.1)
A quick little first last frame image input:
(low res, but seemed to work fine)

I’m using your workflow as well, without adding any extra LoRA. It might be because the face is relatively small in the image, but ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32 works fine. In terms of audio, it does perform a bit better.

RuneXX

4 days ago

I’m using your workflow as well, without adding any extra LoRA. It might be because the face is relatively small in the image, but ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32 works fine.

And not the fp8 scaled? Only works with mxfp8 ? with exact same input images? Something for Kijai to maybe look at if so ..
Havent had a chance to run a lot of runs so far, but didnt notice anything

mickynqn

3 days ago

In a multi-image reference workflow, the face changes and no longer matches

Multi frame ? or some lora? I did notice some loras might need an update (like IC-Lora-Cameraman etc. Didnt seem to like v1.1)
A quick little first last frame image input:
(low res, but seemed to work fine)

I’m using your workflow as well, without adding any extra LoRA. It might be because the face is relatively small in the image, but ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32 works fine. In terms of audio, it does perform a bit better.

Which workflow did you used?

RuneXX

3 days ago

Which workflow did you used?

First-Last frame workflow.
Can find here https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main/First-Last-Frame

mickynqn

3 days ago

Thank you very much!

LLLLMAfficionado

about 19 hours ago

This new distilled hasn't work very well for me, with and without loras. Even dev with distilled lora's weird. Very stiff (no movement), or weird lighting even though prompt specifically enforces no lighting changes, etc.

RuneXX

about 17 hours ago

Havent done a ton of generation, but had good results my end so far at least.

Did you use the v1.1 of the upscaler model as well? its a must

anr2me

about 11 hours ago

This new distilled hasn't work very well for me, with and without loras. Even dev with distilled lora's weird. Very stiff (no movement), or weird lighting even though prompt specifically enforces no lighting changes, etc.

May be you're using a different sampler/scheduler? 🤔 since they can cause weird output.

Fembro

about 8 hours ago

This new distilled hasn't work very well for me, with and without loras. Even dev with distilled lora's weird. Very stiff (no movement), or weird lighting even though prompt specifically enforces no lighting changes, etc.

When I was making lip sync videos and started to use larger sized starting Images the audio would run and the person not move but the horse would look at the screen like it was saying hey fem guy these pictures are all effed up size wise. So I started with a 32 divisible size and went down to 640x640 and works every time. So I guess I'm curious what sizes you were using. I'm going to try it real quick though. great thing to is with 640x640 its still crisp Image but can make 30 seconds at a time which is nuts to me then use RTX upscaler.

LLLLMAfficionado

8 minutes ago

This new distilled hasn't work very well for me, with and without loras. Even dev with distilled lora's weird. Very stiff (no movement), or weird lighting even though prompt specifically enforces no lighting changes, etc.

When I was making lip sync videos and started to use larger sized starting Images the audio would run and the person not move but the horse would look at the screen like it was saying hey fem guy these pictures are all effed up size wise. So I started with a 32 divisible size and went down to 640x640 and works every time. So I guess I'm curious what sizes you were using. I'm going to try it real quick though. great thing to is with 640x640 its still crisp Image but can make 30 seconds at a time which is nuts to me then use RTX upscaler.

Good point. I've always used 1440p images (scaled down for the first pass and used at full resolution in the upscaler).

I would say the image quality has not suffered at all, but the movement is struggling, especially if the strength of the reference images (I generally use guided image workflows) is set to 1 in a First Frame / Last Frame workflow. For the record, I have not tried T2V. I've even seen it produce a video (which I can't share, as it is NSFW... for research purposes, obviously) that is basically just the first frame extended throughout the clip. In that test, the same image was used as both the first and last frame, while the prompt enforced looping mechanics for the simple action being requested.