HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 10 days ago • 182
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published about 1 month ago • 109
SpatialActor Collection Models and datasets of SpatialActor (https://github.com/shihao1895/SpatialActor) • 4 items • Updated Jan 9 • 1
MemoryVLA Collection Checkpoints, data and logs of MemoryVLA & MemoryVLA+. https://github.com/shihao1895/MemoryVLA • 19 items • Updated Mar 2 • 7
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published Jul 11, 2025 • 62