Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published Oct 30, 2025 • 114 • 5
ELT: Elastic Looped Transformers for Visual Generation Paper • 2604.09168 • Published 5 days ago • 17 • 1
Structured Causal Video Reasoning via Multi-Objective Alignment Paper • 2604.04415 • Published 9 days ago • 8 • 2
Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling Paper • 2604.04987 • Published 10 days ago • 2 • 2
EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers Paper • 2604.09130 • Published 5 days ago • 3 • 2
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 6 days ago • 225 • 4
Process Reward Agents for Steering Knowledge-Intensive Reasoning Paper • 2604.09482 • Published 5 days ago • 4 • 2
MixFlow: Mixed Source Distributions Improve Rectified Flows Paper • 2604.09181 • Published 5 days ago • 2 • 2
Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization Paper • 2604.08118 • Published 6 days ago • 1 • 2
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation Paper • 2604.08540 • Published 6 days ago • 3 • 2
CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation Paper • 2604.09201 • Published 5 days ago • 2 • 1
AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents Paper • 2603.27490 • Published 17 days ago • 14 • 2
Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance Paper • 2604.01848 • Published 12 days ago • 4 • 2
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 5 days ago • 42 • 2
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism Paper • 2604.09544 • Published 5 days ago • 4 • 2