Autoregressive Model Beats Diffusion: Llama for Scalable Image
Generation
Paper
• 2406.06525
• Published • 71
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper
• 2406.06469
• Published • 29
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language
Models
Paper
• 2406.04271
• Published • 29
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper
• 2406.02657
• Published • 41
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective
Navigation via Multi-Agent Collaboration
Paper
• 2406.01014
• Published • 33
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with
LLM
Paper
• 2406.02884
• Published • 18
Scaling Laws for Reward Model Overoptimization in Direct Alignment
Algorithms
Paper
• 2406.02900
• Published • 13
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Paper
• 2406.02523
• Published • 12
Transformers are SSMs: Generalized Models and Efficient Algorithms
Through Structured State Space Duality
Paper
• 2405.21060
• Published • 68
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper
• 2405.20204
• Published • 37
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
Paper
• 2405.20335
• Published • 17
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper
• 2406.04692
• Published • 59
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model
Series
Paper
• 2405.19327
• Published • 48
2BP: 2-Stage Backpropagation
Paper
• 2405.18047
• Published • 26
Paper
• 2405.18407
• Published • 48
Yuan 2.0-M32: Mixture of Experts with Attention Router
Paper
• 2405.17976
• Published • 21
An Introduction to Vision-Language Modeling
Paper
• 2405.17247
• Published • 90
Transformers Can Do Arithmetic with the Right Embeddings
Paper
• 2405.17399
• Published • 54
Human4DiT: Free-view Human Video Generation with 4D Diffusion
Transformer
Paper
• 2405.17405
• Published • 16
Trans-LoRA: towards data-free Transferable Parameter
Efficient Finetuning
Paper
• 2405.17258
• Published • 16
LoGAH: Predicting 774-Million-Parameter Transformers using Graph
HyperNetworks with 1/100 Parameters
Paper
• 2405.16287
• Published • 11
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision
Models
Paper
• 2405.15574
• Published • 55
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal
Models
Paper
• 2405.15738
• Published • 46
Stacking Your Transformers: A Closer Look at Model Growth for Efficient
LLM Pre-Training
Paper
• 2405.15319
• Published • 28
AutoCoder: Enhancing Code Large Language Model with
AIEV-Instruct
Paper
• 2405.14906
• Published • 26
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based
Approach
Paper
• 2405.15613
• Published • 17
HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed
via Gaussian Splatting
Paper
• 2405.15125
• Published • 8
Not All Language Model Features Are Linear
Paper
• 2405.14860
• Published • 40
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale
Synthetic Data
Paper
• 2405.14333
• Published • 45
Dense Connector for MLLMs
Paper
• 2405.13800
• Published • 24
Your Transformer is Secretly Linear
Paper
• 2405.12250
• Published • 157
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper
• 2405.12981
• Published • 33
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and
Attribute Control
Paper
• 2405.12970
• Published • 25
Diffusion for World Modeling: Visual Details Matter in Atari
Paper
• 2405.12399
• Published • 30
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
• 2405.12130
• Published • 50
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
• 2405.11143
• Published • 41
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper
• 2405.12107
• Published • 29
LoRA Learns Less and Forgets Less
Paper
• 2405.09673
• Published • 91
Many-Shot In-Context Learning in Multimodal Foundation Models
Paper
• 2405.09798
• Published • 32
What matters when building vision-language models?
Paper
• 2405.02246
• Published • 104
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in
Long-Horizon Tasks
Paper
• 2408.03615
• Published • 31