Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Chayan Vats
chayanvats11
Follow
0 followers
·
1 following
AI & ML interests
None yet
Recent Activity
reacted
to
namanvats
's
post
with ❤️
1 day ago
Ran a small controlled study on a frozen 40-task slice of Harbor Terminal-Bench-Pro, using the same model (`minimax/minimax-m2.5`) with two agent harnesses: Goose and OpenHands-SDK. Under the base setup, reducing the turn budget from 100 to 60 pushed the two harnesses in opposite directions: * Goose: 0.450 → 0.525 * OpenHands-SDK: 0.575 → 0.500 A tweaked 60-turn setup brought OpenHands-SDK back to 0.575. At their best, both harnesses reached the same 0.575 pass rate. What surprised me most was the token profile: in this setup, the reported token usage for OpenHands-SDK was dramatically higher than Goose while converging to the same best score. Same model, same task slice, different harness behavior under a tighter interaction budget. Dataset: https://huggingface.co/datasets/namanvats/harbor-goose-openhands-benchmark Code/configs: https://github.com/namanvats/harbor-agent-ablation
View all activity
Organizations
None yet
chayanvats11
's models
None public yet