https://huggingface.co/GSAI-ML/LLaDA-o
LLaDA-o
We introduce LLaDA-o, an effective and length-adaptive omni diffusion model for unified multimodal understanding and generation.
LLaDA-o extends diffusion language modeling to a broader multimodal setting, supporting both visual understanding and visual generation within a single framework. The released codebase provides a practical inference pipeline for interleaved text-image processing and a notebook-based workflow for reproducible experiments.
It was presented in the paper LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model.
Code: https://github.com/ML-GSAI/LLaDA-o
Highlights
Unified multimodal modeling for both understanding and generation
Support for text-to-image generation
Support for image understanding
Support for instruction-based image editing
Reproducible inference workflow through multimodal_demo.ipynb
Supported Tasks
The current release is designed for the following multimodal inference settings:
Text-to-image: generate images from natural language prompts
Image understanding: produce textual responses conditioned on an input image
Image editing: edit an image according to a textual instruction
Interleaved multimodal inference: process text and image context within a shared diffusion-based framework
Hey valtec, I see a lot of text, but where's the model lol ?
oh nooo... their model config is done wrong =(
GSAI-ML/LLaDA-o: no architectures entry (malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0
I assume it's because the naming of the json file is different. in my auto queue I cannot do it, either has to be handled manually by someone (I can ask nico), or the original author could make it properly (I assume just rename) and it would be great for the others that are trying to play with the model
It would be interesting to get this model quantized, but I have no idea how LLADA works. I know it uses corruption and reconstruction just like image and video models... I've been studying a lot and I believe this is the future. With diffusion, like in image and video models, token generation is much faster... If you can do it, great, I appreciate it! It would be awesome for the whole HF community. But if not, no worries, it's a new technology... so it's completely different from what we're used to... but thanks for replying, I really appreciate it... thanks!!!!
I mean someone just needs to ask them to change the config.json and then we pray that it works. could you do that please?
I'll try with them To see :)https://huggingface.co/GSAI-ML/LLaDA-o/discussions/1 Let's see if I get any answer
I received the following instruction and guidance:
renaming llm_config.json -> config.json should fix it, and then it's upto llama cpp to support it or not
you received it from me
All right thank you, we tried!!