CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing Paper • 2603.08589 • Published Mar 9 • 38
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published Jan 1 • 133
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models Paper • 2512.24618 • Published Dec 31, 2025 • 154
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17, 2025 • 42
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Paper • 2507.13332 • Published Jul 17, 2025 • 49
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Paper • 2507.13344 • Published Jul 17, 2025 • 59
π^3: Scalable Permutation-Equivariant Visual Geometry Learning Paper • 2507.13347 • Published Jul 17, 2025 • 67
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17, 2025 • 79
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Paper • 2507.11097 • Published Jul 15, 2025 • 64
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22, 2025 • 64
Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory Paper • 2507.16713 • Published Jul 22, 2025 • 21
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Paper • 2507.16746 • Published Jul 22, 2025 • 35
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22, 2025 • 42
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers Paper • 2507.08422 • Published Jul 11, 2025 • 36
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance Paper • 2507.18192 • Published Jul 24, 2025 • 8