ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models Paper • 2603.13033 • Published Mar 13 • 13
Cosmos-Tokenizer Collection A suite of image and video tokenizers • 12 items • Updated about 19 hours ago • 44