Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling Paper • 2604.04987 • Published 14 days ago • 3
Flora: Low-Rank Adapters Are Secretly Gradient Compressors Paper • 2402.03293 • Published Feb 5, 2024 • 6
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Paper • 2410.20650 • Published Oct 28, 2024 • 17