feat: replace triton do_bench with torch.profiler for kernel timing 7d51e61 wyldecat Claude Opus 4.6 (1M context) commited on 6 days ago