HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation Paper • 2604.28196 • Published 8 days ago • 68
Online Self-Calibration Against Hallucination in Vision-Language Models Paper • 2605.00323 • Published 7 days ago • 3
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Paper • 2604.11626 • Published 25 days ago • 101
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision Paper • 2604.04934 • Published Apr 6 • 46
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning Paper • 2604.04746 • Published 30 days ago • 71
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 30 days ago • 323
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 501
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models Paper • 2604.08546 • Published 29 days ago • 115
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 216
Emergent Social Intelligence Risks in Generative Multi-Agent Systems Paper • 2603.27771 • Published Mar 29 • 52
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 341