Sahu's picture

Sahu

Manishsahu53

·

https://github.com/ManishSahu53

ManishSahu53

AI & ML interests

None yet

Recent Activity

reacted to Juanxi's post with 🔥 11 minutes ago

📢 Awesome Multimodal Modeling We introduce Awesome Multimodal Modeling, a curated repository tracing the architectural evolution of multimodal intelligence—from foundational fusion to native omni-models. 🔹 Taxonomy & Evolution: Traditional Multimodal Learning – Foundational work on representation, fusion, and alignment. Multimodal LLMs (MLLMs) – Architectures connecting vision encoders to LLMs for understanding. Unified Multimodal Models (UMMs) – Models unifying Understanding + Generation via Diffusion, Autoregressive, or Hybrid paradigms. Native Multimodal Models (NMMs) – Models trained from scratch on all modalities; contrasts early vs. late fusion under scaling laws. 💡 Key Distinction: UMMs unify tasks via generation heads; NMMs enforce interleaving through joint pre-training. 🔗 Explore & Contribute: https://github.com/OpenEnvision/Awesome-Multimodal-Modeling

liked a model 5 days ago

black-forest-labs/FLUX.2-small-decoder

liked a model about 1 month ago

FireRedTeam/FireRed-Image-Edit-1.1

View all activity

Organizations

Manishsahu53 's models 1

Manishsahu53/flux-kontext-fashion-extractor

Text-to-Image • Updated Oct 24, 2025 • 7 •