NVIDIA Nemotron V2 Collection Open, Production-ready Enterprise Models. Nvidia Open Model license. • 9 items • Updated 8 days ago • 104
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 407
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 293
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance Apr 16, 2025 • 69
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 666
Phi-4 Collection Phi-4 family of small language, multi-modal and reasoning models. • 17 items • Updated Jul 10, 2025 • 207
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 264