πŸ€– SelfEvo β€” Self-Improving Agent Environment

An AI agent that doesn't just answer β€” it learns how to answer better, step by step, under real constraints.


🧠 What Is This?

SelfEvo is an experimental AI benchmark environment where an agent is given a task (math, reasoning, or coding) and must solve it within a budget. But instead of answering directly, the agent:

  1. Picks a strategy β€” How deep should it reason? Should it use tools? Chain-of-thought or direct?
  2. Uses tools β€” Calculator, Python sandbox, or a knowledge search.
  3. Submits an answer β€” Gets scored on accuracy, speed, and budget efficiency.
  4. Self-improves β€” If the score is low, it modifies its strategy and retries.

This loop is the "self-evolving" core of SelfEvo.


πŸ” How It Works β€” The Loop

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  EPISODE START               β”‚
β”‚         Task assigned + Budget set           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  1. Observe Task State β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  2. Modify Strategy?   │◄──── Low score on prev attempt
         β”‚  (prompt style, depth) β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  3. Use a Tool?        β”‚
         β”‚  Calculator / Python   β”‚
         β”‚  / Search KB           β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  4. Submit Answer      β”‚
         β”‚  β†’ Score + Reward      β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
              Score β‰₯ 0.9? ──Yes──► DONE πŸ†
                      β”‚
                      No
                      β–Ό
              Budget left? ──No───► FAIL ❌
                      β”‚
                     Yes
                      β–Ό
              Back to Step 2 ↑

πŸ—‚οΈ Project Structure

SelfEvo/
β”œβ”€β”€ app/
β”‚   └── app.py            # Gradio UI β€” run this to launch the web app
β”œβ”€β”€ agent/
β”‚   β”œβ”€β”€ baseline_agent.py # Core agent logic: strategy, tools, answer gen
β”‚   └── strategy.py       # Strategy recommendation engine
β”œβ”€β”€ env/
β”‚   β”œβ”€β”€ environment.py    # OpenEnv protocol: reset(), step(), state()
β”‚   β”œβ”€β”€ tasks.py          # 15 predefined tasks (Easy / Medium / Hard)
β”‚   β”œβ”€β”€ tools.py          # Calculator, Python sandbox, Search tool
β”‚   β”œβ”€β”€ state.py          # Typed state: Strategy, Attempt, ToolSpec
β”‚   β”œβ”€β”€ actions.py        # Action types: MODIFY_STRATEGY, USE_TOOL, SUBMIT
β”‚   β”œβ”€β”€ reward.py         # Reward formula computation
β”‚   └── grader.py         # Grading: numeric, fuzzy, composite modes
β”œβ”€β”€ configs/              # Config YAML files
β”œβ”€β”€ scripts/              # Utility / benchmark scripts
β”œβ”€β”€ requirements.txt
└── Dockerfile

βš™οΈ Reward Formula

reward = accuracy_score
       + improvement_bonus   (beat your previous best)
       - tool_cost_penalty   (budget consumed)
       + efficiency_bonus    (finished in fewer steps)

The agent is rewarded for being accurate, improving, and efficient β€” not just for getting the right answer.


πŸ› οΈ Tools Available

Tool Cost What It Does
calculator 0.5 Safe math expression evaluator
python_code_executor 1.5 Sandboxed Python runner (stdout captured)
search_tool 1.0 Knowledge-base lookup for facts

πŸ“‹ Task Difficulties

Level Budget Description
Easy 10 Single-step arithmetic and simple logic
Medium 20 Multi-step word problems and algebra
Hard 30 Cross-domain reasoning + algorithms + planning

πŸš€ Running Locally

# 1. Install dependencies
pip install -r requirements.txt

# 2. Launch the web UI
python app/app.py

# 3. Open browser at
http://localhost:7860

🎯 Using the UI

Predefined Task

  • Select from 15 built-in tasks (Easy β†’ Hard)
  • Set a random seed for reproducibility
  • Hit Run Episode and watch the agent think step-by-step

Custom Task (Your own input)

  • Switch to Custom Task mode
  • Type any question, set the expected answer
  • Choose Task Type: REASONING / MATH / CODING
  • Choose Output Type: String (fuzzy match) or Number (exact/tolerance)
  • Set budget and run

πŸ“Š Scoring

Score Meaning
πŸ† β‰₯ 0.90 Excellent β€” agent nailed it
⚠️ β‰₯ 0.50 Partial β€” agent got close
❌ < 0.50 Failed β€” wrong or inefficient

πŸ‘₯ Team

Group: LowIQBoys

Role Name
πŸ‘‘ Leader Akhilesh Adam
πŸ‘€ Member Pavav Sandhuptla

SelfEvo β€” Because good agents don't just answer. They evolve.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Akhil-8605/SelfEvo