π€ SelfEvo β Self-Improving Agent Environment
An AI agent that doesn't just answer β it learns how to answer better, step by step, under real constraints.
π§ What Is This?
SelfEvo is an experimental AI benchmark environment where an agent is given a task (math, reasoning, or coding) and must solve it within a budget. But instead of answering directly, the agent:
- Picks a strategy β How deep should it reason? Should it use tools? Chain-of-thought or direct?
- Uses tools β Calculator, Python sandbox, or a knowledge search.
- Submits an answer β Gets scored on accuracy, speed, and budget efficiency.
- Self-improves β If the score is low, it modifies its strategy and retries.
This loop is the "self-evolving" core of SelfEvo.
π How It Works β The Loop
ββββββββββββββββββββββββββββββββββββββββββββββββ
β EPISODE START β
β Task assigned + Budget set β
βββββββββββββββββββββββ¬βββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββ
β 1. Observe Task State β
ββββββββββββββ¬ββββββββββββ
βΌ
ββββββββββββββββββββββββββ
β 2. Modify Strategy? ββββββ Low score on prev attempt
β (prompt style, depth) β
ββββββββββββββ¬ββββββββββββ
βΌ
ββββββββββββββββββββββββββ
β 3. Use a Tool? β
β Calculator / Python β
β / Search KB β
ββββββββββββββ¬ββββββββββββ
βΌ
ββββββββββββββββββββββββββ
β 4. Submit Answer β
β β Score + Reward β
ββββββββββββββ¬ββββββββββββ
βΌ
Score β₯ 0.9? ββYesβββΊ DONE π
β
No
βΌ
Budget left? ββNoββββΊ FAIL β
β
Yes
βΌ
Back to Step 2 β
ποΈ Project Structure
SelfEvo/
βββ app/
β βββ app.py # Gradio UI β run this to launch the web app
βββ agent/
β βββ baseline_agent.py # Core agent logic: strategy, tools, answer gen
β βββ strategy.py # Strategy recommendation engine
βββ env/
β βββ environment.py # OpenEnv protocol: reset(), step(), state()
β βββ tasks.py # 15 predefined tasks (Easy / Medium / Hard)
β βββ tools.py # Calculator, Python sandbox, Search tool
β βββ state.py # Typed state: Strategy, Attempt, ToolSpec
β βββ actions.py # Action types: MODIFY_STRATEGY, USE_TOOL, SUBMIT
β βββ reward.py # Reward formula computation
β βββ grader.py # Grading: numeric, fuzzy, composite modes
βββ configs/ # Config YAML files
βββ scripts/ # Utility / benchmark scripts
βββ requirements.txt
βββ Dockerfile
βοΈ Reward Formula
reward = accuracy_score
+ improvement_bonus (beat your previous best)
- tool_cost_penalty (budget consumed)
+ efficiency_bonus (finished in fewer steps)
The agent is rewarded for being accurate, improving, and efficient β not just for getting the right answer.
π οΈ Tools Available
| Tool | Cost | What It Does |
|---|---|---|
calculator |
0.5 | Safe math expression evaluator |
python_code_executor |
1.5 | Sandboxed Python runner (stdout captured) |
search_tool |
1.0 | Knowledge-base lookup for facts |
π Task Difficulties
| Level | Budget | Description |
|---|---|---|
| Easy | 10 | Single-step arithmetic and simple logic |
| Medium | 20 | Multi-step word problems and algebra |
| Hard | 30 | Cross-domain reasoning + algorithms + planning |
π Running Locally
# 1. Install dependencies
pip install -r requirements.txt
# 2. Launch the web UI
python app/app.py
# 3. Open browser at
http://localhost:7860
π― Using the UI
Predefined Task
- Select from 15 built-in tasks (Easy β Hard)
- Set a random seed for reproducibility
- Hit Run Episode and watch the agent think step-by-step
Custom Task (Your own input)
- Switch to Custom Task mode
- Type any question, set the expected answer
- Choose Task Type:
REASONING/MATH/CODING - Choose Output Type:
String(fuzzy match) orNumber(exact/tolerance) - Set budget and run
π Scoring
| Score | Meaning |
|---|---|
| π β₯ 0.90 | Excellent β agent nailed it |
| β οΈ β₯ 0.50 | Partial β agent got close |
| β < 0.50 | Failed β wrong or inefficient |
π₯ Team
Group: LowIQBoys
| Role | Name |
|---|---|
| π Leader | Akhilesh Adam |
| π€ Member | Pavav Sandhuptla |
SelfEvo β Because good agents don't just answer. They evolve.