@Juanxi on Hugging Face: "📢 Awesome Multimodal Modeling We introduce Awesome Multimodal Modeling, a…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

Juanxi

posted an update 1 day ago

Post

1749

📢 Awesome Multimodal Modeling

We introduce Awesome Multimodal Modeling, a curated repository tracing the architectural evolution of multimodal intelligence—from foundational fusion to native omni-models.

🔹 Taxonomy & Evolution:

Traditional Multimodal Learning – Foundational work on representation, fusion, and alignment.
Multimodal LLMs (MLLMs) – Architectures connecting vision encoders to LLMs for understanding.
Unified Multimodal Models (UMMs) – Models unifying Understanding + Generation via Diffusion, Autoregressive, or Hybrid paradigms.
Native Multimodal Models (NMMs) – Models trained from scratch on all modalities; contrasts early vs. late fusion under scaling laws.
💡 Key Distinction:
UMMs unify tasks via generation heads; NMMs enforce interleaving through joint pre-training.

🔗 Explore & Contribute: https://github.com/OpenEnvision-Lab/Awesome-Multimodal-Modeling

Juanxi

1 day ago

Welcome all the stars and pull requests!

LuYinMiao

1 day ago

oh my god so nice work

In this post