StoryBlender: Inter-Shot Consistent and Editable 3D Storyboard with Spatial-temporal Dynamics Paper • 2604.03315 • Published 15 days ago • 1
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published 8 days ago • 89
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment Paper • 2604.11689 • Published 3 days ago • 7
TIPSv2 Collection TIPSv2 foundational vision-language models. Webpage: https://gdm-tipsv2.github.io/ • 9 items • Updated about 23 hours ago • 6
ERNIE-Image Collection The serieas of image generation models, including text2img、img2img. • 2 items • Updated 1 day ago • 17
Learning Long-term Motion Embeddings for Efficient Kinematics Generation Paper • 2604.11737 • Published 3 days ago • 4
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Paper • 2604.11778 • Published 3 days ago • 6
E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory Paper • 2601.16622 • Published Jan 23 • 1
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Paper • 2508.13009 • Published Aug 18, 2025 • 26
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 7 days ago • 229
Optimization-Guided Diffusion for Interactive Scene Generation Paper • 2512.07661 • Published Dec 8, 2025 • 5