LaViDa: A Large Diffusion Language Model for Multimodal Understanding
Paper • 2505.16839 • Published • 13
[Github][Paper] [Arxiv] [Checkpoints] [Data] [Website]
This is a transformers-compatible version of the LaViDa-LLaDa checkpoint. It allows direct loading via Huggingface transformers APIs for easier inference and integration.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained('./lavida-llada-v1.0-instruct/')
model = AutoModelForCausalLM.from_pretrained('./lavida-llada-v1.0-instruct/', torch_dtype=torch.bfloat16)
image_processor = model.get_vision_tower().image_processor
model.resize_token_embeddings(len(tokenizer))
model.tie_weights()
Apache 2.0