https://huggingface.co/GSAI-ML/LLaDA-o

#2080

by VaLtEc-BoY - opened 23 days ago

LLaDA-o
We introduce LLaDA-o, an effective and length-adaptive omni diffusion model for unified multimodal understanding and generation.

LLaDA-o extends diffusion language modeling to a broader multimodal setting, supporting both visual understanding and visual generation within a single framework. The released codebase provides a practical inference pipeline for interleaved text-image processing and a notebook-based workflow for reproducible experiments.

It was presented in the paper LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model.

Code: https://github.com/ML-GSAI/LLaDA-o

Highlights
Unified multimodal modeling for both understanding and generation
Support for text-to-image generation
Support for image understanding
Support for instruction-based image editing
Reproducible inference workflow through multimodal_demo.ipynb
Supported Tasks
The current release is designed for the following multimodal inference settings:

Text-to-image: generate images from natural language prompts
Image understanding: produce textual responses conditioned on an input image
Image editing: edit an image according to a textual instruction
Interleaved multimodal inference: process text and image context within a shared diffusion-based framework

RichardErkhov

23 days ago

Hey valtec, I see a lot of text, but where's the model lol ?

VaLtEc-BoY

22 days ago

•

edited 22 days ago

Hello! https://huggingface.co/GSAI-ML/LLaDA-o

VaLtEc-BoY changed discussion title from GSAI-ML/LLaDA-o to https://huggingface.co/GSAI-ML/LLaDA-o 22 days ago

RichardErkhov

22 days ago

oh nooo... their model config is done wrong =(
GSAI-ML/LLaDA-o: no architectures entry (malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0
I assume it's because the naming of the json file is different. in my auto queue I cannot do it, either has to be handled manually by someone (I can ask nico), or the original author could make it properly (I assume just rename) and it would be great for the others that are trying to play with the model

VaLtEc-BoY

22 days ago

It would be interesting to get this model quantized, but I have no idea how LLADA works. I know it uses corruption and reconstruction just like image and video models... I've been studying a lot and I believe this is the future. With diffusion, like in image and video models, token generation is much faster... If you can do it, great, I appreciate it! It would be awesome for the whole HF community. But if not, no worries, it's a new technology... so it's completely different from what we're used to... but thanks for replying, I really appreciate it... thanks!!!!

RichardErkhov

22 days ago

I mean someone just needs to ask them to change the config.json and then we pray that it works. could you do that please?

VaLtEc-BoY

21 days ago

•

edited 21 days ago

I'll try with them To see :)https://huggingface.co/GSAI-ML/LLaDA-o/discussions/1 Let's see if I get any answer

VaLtEc-BoY

21 days ago

I received the following instruction and guidance:

renaming llm_config.json -> config.json should fix it, and then it's upto llama cpp to support it or not

RichardErkhov

21 days ago

you received it from me

VaLtEc-BoY

21 days ago

All right thank you, we tried!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment