Was Stirrup used for the 27.9% APEX-Agents-AA score?
#32
by pandemo - opened
Hi Moonshot team,
Congrats on the Kimi K2.6 release, very impressive results! I had a quick clarification regarding the reported 27.9% APEX-Agents-AA score. Was this result obtained using Artificial Analysis’ Stirrup agent harness, or a different agentic harness?
Thanks in advance for any clarification!🙏