Efficient Memory Management for Large Language Model Serving with PagedAttention Paper • 2309.06180 • Published Sep 12, 2023 • 51
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 Text Generation • 32B • Updated about 1 month ago • 1.57M • 712