Submitted by akhaliq 95 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks · 9 authors 73 6
Submitted by akhaliq 37 JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models · 12 authors 1
Submitted by akhaliq 33 Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model · 10 authors 4
Submitted by akhaliq 30 Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs · 7 authors 476 2
Submitted by akhaliq 21 Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization · 14 authors 1
Submitted by akhaliq 14 FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores · 4 authors 1
Submitted by akhaliq 12 ADaPT: As-Needed Decomposition and Planning with Language Models · 7 authors 90 1
Submitted by akhaliq 11 Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities · 6 authors 1
Submitted by akhaliq 9 Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems · 8 authors 1