-
Taming LLMs by Scaling Learning Rates with Gradient Grouping
Paper • 2506.01049 • Published • 39 -
Switch EMA: A Free Lunch for Better Flatness and Sharpness
Paper • 2402.09240 • Published • 5 -
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Paper • 2410.06373 • Published • 36 -
OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning
Paper • 2209.04851 • Published • 3
Juanxi Tian
Juanxi
AI & ML interests
Efficient AI & Gen AI
Recent Activity
repliedto their post 1 day ago
📢 Awesome Multimodal Modeling
We introduce Awesome Multimodal Modeling, a curated repository tracing the architectural evolution of multimodal intelligence—from foundational fusion to native omni-models.
🔹 Taxonomy & Evolution:
Traditional Multimodal Learning – Foundational work on representation, fusion, and alignment.
Multimodal LLMs (MLLMs) – Architectures connecting vision encoders to LLMs for understanding.
Unified Multimodal Models (UMMs) – Models unifying Understanding + Generation via Diffusion, Autoregressive, or Hybrid paradigms.
Native Multimodal Models (NMMs) – Models trained from scratch on all modalities; contrasts early vs. late fusion under scaling laws.
💡 Key Distinction:
UMMs unify tasks via generation heads; NMMs enforce interleaving through joint pre-training.
🔗 Explore & Contribute: https://github.com/OpenEnvision-Lab/Awesome-Multimodal-Modeling reacted to theirpost with 👍 1 day ago
📢 Awesome Multimodal Modeling
We introduce Awesome Multimodal Modeling, a curated repository tracing the architectural evolution of multimodal intelligence—from foundational fusion to native omni-models.
🔹 Taxonomy & Evolution:
Traditional Multimodal Learning – Foundational work on representation, fusion, and alignment.
Multimodal LLMs (MLLMs) – Architectures connecting vision encoders to LLMs for understanding.
Unified Multimodal Models (UMMs) – Models unifying Understanding + Generation via Diffusion, Autoregressive, or Hybrid paradigms.
Native Multimodal Models (NMMs) – Models trained from scratch on all modalities; contrasts early vs. late fusion under scaling laws.
💡 Key Distinction:
UMMs unify tasks via generation heads; NMMs enforce interleaving through joint pre-training.
🔗 Explore & Contribute: https://github.com/OpenEnvision-Lab/Awesome-Multimodal-Modeling reacted to theirpost with 😎 1 day ago
📢 Awesome Multimodal Modeling
We introduce Awesome Multimodal Modeling, a curated repository tracing the architectural evolution of multimodal intelligence—from foundational fusion to native omni-models.
🔹 Taxonomy & Evolution:
Traditional Multimodal Learning – Foundational work on representation, fusion, and alignment.
Multimodal LLMs (MLLMs) – Architectures connecting vision encoders to LLMs for understanding.
Unified Multimodal Models (UMMs) – Models unifying Understanding + Generation via Diffusion, Autoregressive, or Hybrid paradigms.
Native Multimodal Models (NMMs) – Models trained from scratch on all modalities; contrasts early vs. late fusion under scaling laws.
💡 Key Distinction:
UMMs unify tasks via generation heads; NMMs enforce interleaving through joint pre-training.
🔗 Explore & Contribute: https://github.com/OpenEnvision-Lab/Awesome-Multimodal-Modeling