-
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Paper • 2403.14608 • Published -
Towards Better Parameter-Efficient Fine-Tuning for Large Language Models: A Position Paper
Paper • 2311.13126 • Published • 1 -
Comparing Retrieval-Augmentation and Parameter-Efficient Fine-Tuning for Privacy-Preserving Personalization of Large Language Models
Paper • 2409.09510 • Published -
Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning
Paper • 2407.01320 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2406.05678
-
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 53 -
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Paper • 2308.16137 • Published • 41 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 3 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 22
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 2 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 82 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper • 2306.06000 • Published • 1 -
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
Paper • 2405.12532 • Published -
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
Paper • 2404.04793 • Published • 1 -
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Paper • 2405.14366 • Published • 3
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 29 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 46 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
-
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Paper • 2403.14608 • Published -
Towards Better Parameter-Efficient Fine-Tuning for Large Language Models: A Position Paper
Paper • 2311.13126 • Published • 1 -
Comparing Retrieval-Augmentation and Parameter-Efficient Fine-Tuning for Privacy-Preserving Personalization of Large Language Models
Paper • 2409.09510 • Published -
Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning
Paper • 2407.01320 • Published
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper • 2306.06000 • Published • 1 -
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
Paper • 2405.12532 • Published -
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
Paper • 2404.04793 • Published • 1 -
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Paper • 2405.14366 • Published • 3
-
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 53 -
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Paper • 2308.16137 • Published • 41 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 3 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 22
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 29 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 46 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 2 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 82 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1