CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published Feb 27 • 98
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference Paper • 2505.22758 • Published May 28, 2025 • 1
Liger Kernel: Efficient Triton Kernels for LLM Training Paper • 2410.10989 • Published Oct 14, 2024 • 3
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 38