Papers - a mdw123 Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

mdw123 's Collections

Papers

updated May 8, 2025

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 57
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29, 2024 • 24
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1, 2024 • 33
AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1, 2024 • 14
Learning to Decode Collaboratively with Multiple Language Models

Paper • 2403.03870 • Published Mar 6, 2024 • 21
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 66
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 190
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8, 2024 • 45
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 49
Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11, 2024 • 91
MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12, 2024 • 77
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12, 2024 • 45
Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

Paper • 2403.07750 • Published Mar 12, 2024 • 23
Chronos: Learning the Language of Time Series

Paper • 2403.07815 • Published Mar 12, 2024 • 49
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Veagle: Advancements in Multimodal Representation Learning

Paper • 2403.08773 • Published Jan 18, 2024 • 10
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 38
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 11
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer

Paper • 2403.10301 • Published Mar 15, 2024 • 54
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20, 2024 • 183
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 82
InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26, 2024 • 34
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 112
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

Paper • 2403.20041 • Published Mar 29, 2024 • 34
Localizing Paragraph Memorization in Language Models

Paper • 2403.19851 • Published Mar 28, 2024 • 15
DiJiang: Efficient Large Language Models through Compact Kernelization

Paper • 2403.19928 • Published Mar 29, 2024 • 12
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 26
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 111
Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 36
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 27
FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17, 2024 • 34
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 377
YuLan-Mini: An Open Data-efficient Language Model

Paper • 2412.17743 • Published Dec 23, 2024 • 66
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 107
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 82
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 302
Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17, 2025 • 115
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 258
Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5, 2025 • 58
LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5, 2025 • 62
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10, 2025 • 153
The Differences Between Direct Alignment Algorithms are a Blur

Paper • 2502.01237 • Published Feb 3, 2025 • 113
s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31, 2025 • 125
Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26, 2025 • 72
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447
Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22, 2025 • 128
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
A Survey on Post-training of Large Language Models

Paper • 2503.06072 • Published Mar 8, 2025 • 11
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Paper • 2503.16419 • Published Mar 20, 2025 • 77
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Paper • 2503.16219 • Published Mar 20, 2025 • 52
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published Apr 29, 2025 • 31

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs