Submitted by akhaliq 64 Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context · 671 authors 6
Submitted by akhaliq 49 DeepSeek-VL: Towards Real-World Vision-Language Understanding DeepSeek 4.09k 4
Submitted by akhaliq 45 ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment · 6 authors 1.28k 2
Submitted by akhaliq 23 Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks · 14 authors 1
Submitted by akhaliq 23 CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion · 9 authors 1.1k 3
Submitted by akhaliq 21 CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model · 9 authors 3
Submitted by akhaliq 20 VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models · 8 authors 163 1