Collections
Discover the best community collections!
Collections including paper arxiv:1910.02054
-
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Paper • 2402.05140 • Published • 23 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 21 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 61 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 84
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 7 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 11 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 89 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 67 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 40 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 107
-
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 50 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 11
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
The Impact of Reasoning Step Length on Large Language Models
Paper • 2401.04925 • Published • 18 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 44 -
Attention Is All You Need
Paper • 1706.03762 • Published • 120
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 7 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 11 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
-
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 50 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 11
-
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Paper • 2402.05140 • Published • 23 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 21 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 61 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 84
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
The Impact of Reasoning Step Length on Large Language Models
Paper • 2401.04925 • Published • 18 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 44 -
Attention Is All You Need
Paper • 1706.03762 • Published • 120
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 7 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 11 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 7 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 11 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 89 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 67 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 40 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 107