Curiosity

#1
by Alissonerdx - opened

Hi, I have a question: is this editing model a full FineTune model or was it created from a merge with some IC LoRa ? And if it was trained from scratch, could you say which dataset was used?

joyfox org

We crawl all kinds of videos from the Internet, 100-200 pairs per class. we finetune model and use finetune model train lora, but the lora is suitable for all ltx 2.3 model series.

We crawl all kinds of videos from the Internet, 100-200 pairs per class. we finetune model and use finetune model train lora, but the lora is suitable for all ltx 2.3 model series.

Hmm, I understand. So in reality, there is a dataset called Ditto or RECO that contains all the same types of actions as your model. I think you may have used it.

Another thing: what is the point of merging with a base model? From what I saw, it looks like some LoRA may have been merged into the base model. That is not the same as a full fine-tune.

I have a LoRA that does exactly the same thing (https://huggingface.co/Alissonerdx/LTX-LoRAs/blob/main/ltx23_edit_anything_global_rank128_v1_9000steps_adamw.safetensors), and it was published publicly 22 days ago. It is also quite popular, which is why I was asking whether this was done through a merge or through a full fine-tune.

Because it makes little sense to me to have a full model where a LoRA was merged in. It would have been easier to provide the LoRA, not the merged model. And when there is a merge in cases like this, it can give the impression that something is being hidden.

To be clear, that may not be the case here. But if my LoRA was used or merged in any way to achieve this, I do not have a problem with it, as long as proper credit is given. From my perspective, it is not very clear what the advantage is of merging a general edit LoRA into a base model instead of providing it separately, so I would appreciate some clarification on that.

joyfox org

There's no conflict with your model. However, your model isn't suitable for our business scenario; it's simply not capable enough. Therefore, we created our own dataset and trained it entirely. You can try it out; your model is completely inadequate for our needs. Currently, if you'd like to merge our model with yours, we'd be happy to, because open-source software is meant for everyone to use.

Meanwhile, our own general-purpose editing model is also being trained. I can tell you definitively that we are using commercial APIs to generate data, not data generated by your less capable model. We believe that a truly powerful, universal video editing model will be available soon.

None of any data uses Ditto or RECO , we crawls in web . Although we knew ur contribution in open source community, we never merge ur lora model. The task point is obviously diff

There's no conflict with your model. However, your model isn't suitable for our business scenario; it's simply not capable enough. Therefore, we created our own dataset and trained it entirely. You can try it out; your model is completely inadequate for our needs. Currently, if you'd like to merge our model with yours, we'd be happy to, because open-source software is meant for everyone to use.

Thanks for the clarification. I want to make my position clear: I never stated as a fact that you used my LoRA. I asked because the release format and the explanation made it unclear whether this was actually a full fine-tune release, or a LoRA trained on top of the base model and then merged into the checkpoint. That is the main point of my question. If the final editing capability comes from a LoRA trained on the base model, then releasing it as a full merged checkpoint does not make much sense to me. At that point, it is not the same thing as releasing a true full fine-tuned model. It is a base checkpoint with a trained LoRA merged into it. From the usage examples, I also assumed that you may be using your merged base model as the foundation for training or running other LoRAs. In other words, the later LoRAs may depend on this already-merged editing base. If that assumption is wrong, feel free to correct me. Even then, I still think it would be useful to also provide the original LoRA separately. A LoRA is much more accessible for people to use, and much more versatile for people to adjust, than a full checkpoint that effectively does the same thing as the base model plus the LoRA. This distinction matters because the release format can give people the impression that the whole model was fully fine-tuned end-to-end, when in reality the effective editing behavior may come mainly from the LoRA component. That is why I was asking for technical clarity. I appreciate the clarification that you trained on your own dataset and did not use Ditto, RECO, or my LoRA. Regarding commercial quality, that is fine. My model was not built for your business scenario, and I never claimed it was a commercial-grade product. It was trained on around 8,000 samples and released as an experimental open-source model. In theory, that kind of broader dataset can make it more general and flexible, but my focus was never to compete with a specific commercial use case. Also, just to be clear, I have no intention of using your model in mine. I was the first to train and publicly release this kind of LTX editing LoRA, and I am already training several other models in this same direction, including approaches that allow external image references to be used as conditioning. My only request remains simple: if anything from my work is ever used, merged, or used as part of a training pipeline, proper credit should be given. Thanks again for clarifying.

There's no conflict with your model. However, your model isn't suitable for our business scenario; it's simply not capable enough. Therefore, we created our own dataset and trained it entirely. You can try it out; your model is completely inadequate for our needs. Currently, if you'd like to merge our model with yours, we'd be happy to, because open-source software is meant for everyone to use.

The aggressive tone from starsfriday is funny to me, as if I was attacking her personally. Relax — this is not personal. I do this as a hobby. I never claimed my model was made for commercial use or for business scenario. I also know your model is not the same as mine. From what I can see, yours is more focused on restoration/post-processing tasks, while my LoRA tries to handle multiple editing concepts in a single model. That is harder to train, and it is also why some parts may not work perfectly. My focus has always been experimental: I am more interested in testing difficult ideas and opening new directions for the community than training simple LoRAs. The main point is that you saw the idea, understood its potential, and moved in the same direction. For me, that is already a success. Before this, almost nobody was training IC-LoRAs for this kind of LTX editing workflow. This direction did not come from nowhere. It was initially pushed and explored by people from the Banodoco community, including Oumoumad, Cseti, LDWorks David, Kijai, and others who were among the first to test these ideas, share experiments, and make this approach more visible. I also saw a video of the watermark removal feature, and honestly it had a lot of flickering and artifacts. I still need to test it myself before judging the “commercial-level” quality you mentioned. My main technical question remains: if this was trained as a LoRA and then merged into a 22B base model, why release the full merged checkpoint instead of the LoRA itself? That release format makes things look unclear. I am not angry that people are trying similar things. If anything, it motivates me to keep improving and build something better. My only point is simple: if a model or workflow is directly taken from someone else’s public work, proper credit should be given. For me, the subject is clarified and closed. I do not want to keep extending this discussion.

starsfriday changed discussion status to closed

Sign up or log in to comment