FileGram: Grounding Agent Personalization in File-System Behavioral Traces Paper • 2604.04901 • Published 12 days ago • 40
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published 16 days ago • 29
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published Mar 16 • 185
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published Mar 16 • 152
ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors Paper • 2603.04338 • Published Mar 4 • 24
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published Mar 3 • 87
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published Feb 12 • 20
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published Feb 9 • 52
NEO1_0 Collection From Pixels to Words -- Towards Native Vision-Language Primitives at Scale • 7 items • Updated Jan 27 • 9
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published Dec 22, 2025 • 67
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published Dec 18, 2025 • 89
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing Paper • 2512.14681 • Published Dec 16, 2025 • 42
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper • 2512.13604 • Published Dec 15, 2025 • 76
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 189
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning Paper • 2511.18659 • Published Nov 24, 2025 • 25
MDGA Collection Make Diffusion Great Again. The resource list for Super Data Learners, Quokka, and OpenMoE 2. • 7 items • Updated Mar 2 • 8