LTX-2.3 version 1.1 of distilled model
@Kijai
LTX released a new v1.1 of their distilled model and the distilled lora.
https://huggingface.co/Lightricks/LTX-2.3/tree/main
Not sure what was fixed but perhaps worth adding to the repro here ;-)
"The distilled v1.1 version of the full model - A different aesthetic experience and improved audio compared to v1.0"
Ah .. I see you are already converting ;-)
(all the IC-Loras from LTX also been updated to be compatible with the new distilled version.. so i guess the v.1.1 is a bit of a change)
No update for fp8_input_scaled? 🤔
Just a quick test run of the distilled v.1.1 fp8 version from Kijai ;-)
(low-res video since it was just a quick test run)
Didnt do any 1 to 1 comparison versus version 1.0, but from first test run, the sound seems much improved.... seems very clear and natural and not so "digitial" (only done first run though)
And thanks a ton Kijai ;-)
Black Adam ⚡ 😅
ltx-2.3-22b-distilled-1.1_transformer_only_bf16.safetensors when?
No update for fp8_input_scaled? 🤔
Input scaling needs calibration so it's more time consuming to do, the fp8_scaled 1.1 is bit different to the previous ones and has many layers using the fp8 matmuls, so it should still be fast on supported hardware. Calibrated input scaling could potentially increase quality though.
Will the VAE model be updated?
ltx-2.3-22b-distilled-1.1_transformer_only_bf16.safetensors when?
Uploaded.
Will the VAE model be updated?
I don't think there's been a VAE update? The 1.1 was only about the distillation.
In a multi-image reference workflow, the face changes and no longer matches the face in the original images when using ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.
In a multi-image reference workflow, the face changes and no longer matches
Multi frame ? or some lora? I did notice some loras might need an update (like IC-Lora-Cameraman etc. Didnt seem to like v1.1)
A quick little first last frame image input:
(low res, but seemed to work fine)
I did notice some loras might need an update (like IC-Lora-Cameraman etc. Didnt seem to like v1.1)
They also re-uploaded the official IC-LoRAs to support distilled v1.1 (based on the commit history):
LTX-2.3-22b-IC-LoRA-Union-Control
Commit History
Model: Upload updated version compatible with the v1.1 distilled model
May be other(third-party) loras also need to be updated 🤔 not sure what's the difference with the re-uploaded IC-LoRAs, since they use the same file name.
Just wanted to say that I've been testing "ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32" compared to "ltx-2.3-22b-distilled_transformer_only_fp8_input_scaled_v3" and I'm getting way better results with similar render times. Thank you Kijai and RuneXX for all of your hard work!
In a multi-image reference workflow, the face changes and no longer matches
Multi frame ? or some lora? I did notice some loras might need an update (like IC-Lora-Cameraman etc. Didnt seem to like v1.1)
A quick little first last frame image input:
(low res, but seemed to work fine)
I’m using your workflow as well, without adding any extra LoRA. It might be because the face is relatively small in the image, but ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32 works fine. In terms of audio, it does perform a bit better.
I’m using your workflow as well, without adding any extra LoRA. It might be because the face is relatively small in the image, but ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32 works fine.
And not the fp8 scaled? Only works with mxfp8 ? with exact same input images? Something for Kijai to maybe look at if so ..
Havent had a chance to run a lot of runs so far, but didnt notice anything
In a multi-image reference workflow, the face changes and no longer matches
Multi frame ? or some lora? I did notice some loras might need an update (like IC-Lora-Cameraman etc. Didnt seem to like v1.1)
A quick little first last frame image input:
(low res, but seemed to work fine)I’m using your workflow as well, without adding any extra LoRA. It might be because the face is relatively small in the image, but ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32 works fine. In terms of audio, it does perform a bit better.
Which workflow did you used?
Which workflow did you used?
First-Last frame workflow.
Can find here https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main/First-Last-Frame
Thank you very much!
This new distilled hasn't work very well for me, with and without loras. Even dev with distilled lora's weird. Very stiff (no movement), or weird lighting even though prompt specifically enforces no lighting changes, etc.
Havent done a ton of generation, but had good results my end so far at least.
Did you use the v1.1 of the upscaler model as well? its a must
This new distilled hasn't work very well for me, with and without loras. Even dev with distilled lora's weird. Very stiff (no movement), or weird lighting even though prompt specifically enforces no lighting changes, etc.
May be you're using a different sampler/scheduler? 🤔 since they can cause weird output.
This new distilled hasn't work very well for me, with and without loras. Even dev with distilled lora's weird. Very stiff (no movement), or weird lighting even though prompt specifically enforces no lighting changes, etc.
When I was making lip sync videos and started to use larger sized starting Images the audio would run and the person not move but the horse would look at the screen like it was saying hey fem guy these pictures are all effed up size wise. So I started with a 32 divisible size and went down to 640x640 and works every time. So I guess I'm curious what sizes you were using. I'm going to try it real quick though. great thing to is with 640x640 its still crisp Image but can make 30 seconds at a time which is nuts to me then use RTX upscaler.
This new distilled hasn't work very well for me, with and without loras. Even dev with distilled lora's weird. Very stiff (no movement), or weird lighting even though prompt specifically enforces no lighting changes, etc.
When I was making lip sync videos and started to use larger sized starting Images the audio would run and the person not move but the horse would look at the screen like it was saying hey fem guy these pictures are all effed up size wise. So I started with a 32 divisible size and went down to 640x640 and works every time. So I guess I'm curious what sizes you were using. I'm going to try it real quick though. great thing to is with 640x640 its still crisp Image but can make 30 seconds at a time which is nuts to me then use RTX upscaler.
Good point. I've always used 1440p images (scaled down for the first pass and used at full resolution in the upscaler).
I would say the image quality has not suffered at all, but the movement is struggling, especially if the strength of the reference images (I generally use guided image workflows) is set to 1 in a First Frame / Last Frame workflow. For the record, I have not tried T2V. I've even seen it produce a video (which I can't share, as it is NSFW... for research purposes, obviously) that is basically just the first frame extended throughout the clip. In that test, the same image was used as both the first and last frame, while the prompt enforced looping mechanics for the simple action being requested.