ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 36
3D Mesh Generation via Compositional Latent Diffusion
Dense Grounded Understanding of Images and Videos
Generate captions for images
Detect, segment, classify objects in images and videos