AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published 24 days ago • 26
Are Vision-Language Models Truly Understanding Multi-vision Sensor? Paper • 2412.20750 • Published Dec 30, 2024 • 20