MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization
Abstract
LLM agents organize behavior through skills - structured natural-language specifications governing how an agent reasons, retrieves, and responds. Unlike monolithic prompts, skills are multi-field artifacts subject to hard platform constraints: description fields are truncated for routing, instruction bodies are compacted via progressive disclosure, and co-resident skills compete for limited context windows. These constraints make skill optimization inherently multi-objective: a skill must simultaneously maximize task performance and satisfy platform limits. Yet existing prompt optimizers either ignore these trade-offs or collapse them into a weighted sum, missing Pareto-optimal variants in non-convex objective regions. We introduce MOCHA (Multi-Objective Chebyshev Annealing), which replaces single-objective selection with Chebyshev scalarization - covering the full Pareto front, including non-convex regions - combined with exponential annealing that transitions from exploration to exploitation. In our experiments across six diverse agent skills - where all methods share the same multi-objective mutation operator and baselines receive identical per-objective textual feedback - existing optimizers fail to improve the seed skill on 4 of 6 tasks: 1000 rollouts yield zero progress. MOCHA breaks through on every task, achieving 7.5% relative improvement in mean correctness over the strongest baseline (up to 14.9% on FEVER and 10.4% on TheoremQA) while discovering twice as many more Pareto-optimal skill variants.
Community
Skill optimization is inherently multi-objective: a skill must maximize task correctness and satisfy hard platform limits (truncated descriptions, compacted instruction bodies, finite shared context). Prior prompt optimizers either ignore these trade-offs or collapse them into a single scalar, missing Pareto-optimal variants in non-convex regions. MOCHA replaces single-objective selection with Chebyshev scalarization — provably covering the full Pareto front — combined with exponential annealing that transitions from exploration to exploitation as the rollout budget is consumed. Across six diverse skills, MOCHA beats the strongest baseline by 7.5% on average (up to +14.9%) and finds 2× more Pareto-optimal variants, while existing optimizers plateau at the seed on 4 of 6 tasks.
Get this paper in your agent:
hf papers read 2605.19330 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
