-
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Paper • 2602.24286 • Published • 98 -
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
Paper • 2505.22758 • Published • 1 -
Liger Kernel: Efficient Triton Kernels for LLM Training
Paper • 2410.10989 • Published • 3 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 38
Mattias Dürrmeier
mattduerrmeier
AI & ML interests
LLM Inference, faster and more efficient kernels, local inference
Recent Activity
updated a collection about 4 hours ago
systems upvoted a paper about 4 hours ago
FlashDecoding++: Faster Large Language Model Inference on GPUs updated a collection about 4 hours ago
systemsOrganizations
None yet