D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper โข 2510.05684 โข Published Oct 7, 2025 โข 146
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper โข 2506.08279 โข Published Jun 9, 2025 โข 27
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper โข 2504.00557 โข Published Apr 1, 2025 โข 15