Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment Paper • 2604.00913 • Published 12 days ago • 4
Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment Paper • 2604.00913 • Published 12 days ago • 4
Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment Paper • 2604.00913 • Published 12 days ago • 4 • 3
Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment Paper • 2604.00913 • Published 12 days ago • 4
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 15 days ago • 137
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval Paper • 2603.12824 • Published Mar 13 • 5
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval Paper • 2603.12824 • Published Mar 13 • 5
Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction Paper • 2603.09930 • Published Mar 10
Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction Paper • 2603.09930 • Published Mar 10