view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 293
view article Article Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation Sep 16, 2025 • 19
view article Article You could have designed state of the art positional encoding Nov 25, 2024 • 464
Running 3.78k The Ultra-Scale Playbook 🌌 3.78k The ultimate guide to training LLM on large GPU Clusters