VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Paper • 2604.09531 • Published 7 days ago • 8
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 140
Running on CPU Upgrade Agents 1.42k Omni Image Editor 🖼 1.42k Image edit, text to image, image upscale, remove watermark
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward Paper • 2511.20561 • Published Nov 25, 2025 • 33
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios Paper • 2511.18050 • Published Nov 22, 2025 • 38
Running Agents Featured 417 Qwen3 VL Demo 😻 417 Chat with an AI that understands text, images, and videos