Chayan Vats's picture

Chayan Vats

chayanvats11

·

AI & ML interests

None yet

Recent Activity

reacted to namanvats's post with ❤️ 1 day ago

Ran a small controlled study on a frozen 40-task slice of Harbor Terminal-Bench-Pro, using the same model (`minimax/minimax-m2.5`) with two agent harnesses: Goose and OpenHands-SDK. Under the base setup, reducing the turn budget from 100 to 60 pushed the two harnesses in opposite directions: * Goose: 0.450 → 0.525 * OpenHands-SDK: 0.575 → 0.500 A tweaked 60-turn setup brought OpenHands-SDK back to 0.575. At their best, both harnesses reached the same 0.575 pass rate. What surprised me most was the token profile: in this setup, the reported token usage for OpenHands-SDK was dramatically higher than Goose while converging to the same best score. Same model, same task slice, different harness behavior under a tighter interaction budget. Dataset: https://huggingface.co/datasets/namanvats/harbor-goose-openhands-benchmark Code/configs: https://github.com/namanvats/harbor-agent-ablation

View all activity

Organizations

None yet

models 0

None public yet

datasets 0

None public yet