🤖 SelfEvo — Self-Improving Agent Environment

An AI agent that doesn't just answer — it learns how to answer better, step by step, under real constraints.

🧠 What Is This?

SelfEvo is an experimental AI benchmark environment where an agent is given a task (math, reasoning, or coding) and must solve it within a budget. But instead of answering directly, the agent:

Picks a strategy — How deep should it reason? Should it use tools? Chain-of-thought or direct?
Uses tools — Calculator, Python sandbox, or a knowledge search.
Submits an answer — Gets scored on accuracy, speed, and budget efficiency.
Self-improves — If the score is low, it modifies its strategy and retries.

This loop is the "self-evolving" core of SelfEvo.

🔁 How It Works — The Loop

┌──────────────────────────────────────────────┐
│                  EPISODE START               │
│         Task assigned + Budget set           │
└─────────────────────┬────────────────────────┘
                      ▼
         ┌────────────────────────┐
         │  1. Observe Task State │
         └────────────┬───────────┘
                      ▼
         ┌────────────────────────┐
         │  2. Modify Strategy?   │◄──── Low score on prev attempt
         │  (prompt style, depth) │
         └────────────┬───────────┘
                      ▼
         ┌────────────────────────┐
         │  3. Use a Tool?        │
         │  Calculator / Python   │
         │  / Search KB           │
         └────────────┬───────────┘
                      ▼
         ┌────────────────────────┐
         │  4. Submit Answer      │
         │  → Score + Reward      │
         └────────────┬───────────┘
                      ▼
              Score ≥ 0.9? ──Yes──► DONE 🏆
                      │
                      No
                      ▼
              Budget left? ──No───► FAIL ❌
                      │
                     Yes
                      ▼
              Back to Step 2 ↑

🗂️ Project Structure

SelfEvo/
├── app/
│   └── app.py            # Gradio UI — run this to launch the web app
├── agent/
│   ├── baseline_agent.py # Core agent logic: strategy, tools, answer gen
│   └── strategy.py       # Strategy recommendation engine
├── env/
│   ├── environment.py    # OpenEnv protocol: reset(), step(), state()
│   ├── tasks.py          # 15 predefined tasks (Easy / Medium / Hard)
│   ├── tools.py          # Calculator, Python sandbox, Search tool
│   ├── state.py          # Typed state: Strategy, Attempt, ToolSpec
│   ├── actions.py        # Action types: MODIFY_STRATEGY, USE_TOOL, SUBMIT
│   ├── reward.py         # Reward formula computation
│   └── grader.py         # Grading: numeric, fuzzy, composite modes
├── configs/              # Config YAML files
├── scripts/              # Utility / benchmark scripts
├── requirements.txt
└── Dockerfile

⚙️ Reward Formula

reward = accuracy_score
       + improvement_bonus   (beat your previous best)
       - tool_cost_penalty   (budget consumed)
       + efficiency_bonus    (finished in fewer steps)

The agent is rewarded for being accurate, improving, and efficient — not just for getting the right answer.

🛠️ Tools Available

Tool	Cost	What It Does
`calculator`	0.5	Safe math expression evaluator
`python_code_executor`	1.5	Sandboxed Python runner (stdout captured)
`search_tool`	1.0	Knowledge-base lookup for facts

📋 Task Difficulties

Level	Budget	Description
Easy	10	Single-step arithmetic and simple logic
Medium	20	Multi-step word problems and algebra
Hard	30	Cross-domain reasoning + algorithms + planning

🚀 Running Locally

# 1. Install dependencies
pip install -r requirements.txt

# 2. Launch the web UI
python app/app.py

# 3. Open browser at
http://localhost:7860

🎯 Using the UI

Predefined Task

Select from 15 built-in tasks (Easy → Hard)
Set a random seed for reproducibility
Hit Run Episode and watch the agent think step-by-step

Custom Task (Your own input)

Switch to Custom Task mode
Type any question, set the expected answer
Choose Task Type: REASONING / MATH / CODING
Choose Output Type: String (fuzzy match) or Number (exact/tolerance)
Set budget and run

📊 Scoring

Score	Meaning
🏆 ≥ 0.90	Excellent — agent nailed it
⚠️ ≥ 0.50	Partial — agent got close
❌ < 0.50	Failed — wrong or inefficient

👥 Team

Group: LowIQBoys

Role	Name
👑 Leader	Akhilesh Adam
👤 Member	Pavav Sandhuptla

SelfEvo — Because good agents don't just answer. They evolve.

Downloads last month: -; Downloads are not tracked for this model. How to track

Akhil-8605
/

SelfEvo