view article Article Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding 26 days ago • 45
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 Text Generation • 32B • Updated about 1 month ago • 866k • • 334
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 Text Generation • Updated Oct 15, 2025 • 6.86k • • 344
Llama Nemotron Collection Open, Production-ready Enterprise Models • 12 items • Updated 8 days ago • 78
nvidia/Llama-3_3-Nemotron-Super-49B-v1 Text Generation • 50B • Updated Oct 15, 2025 • 33.5k • 321
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 20