Spaces:
Sleeping
Sleeping
Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,15 +1,35 @@
|
|
| 1 |
---
|
| 2 |
title: Agent Eval Lab
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
-
python_version: '3.13'
|
| 9 |
app_file: app.py
|
| 10 |
pinned: false
|
| 11 |
license: apache-2.0
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
title: Agent Eval Lab
|
| 3 |
+
emoji: "🧪"
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: indigo
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 5.29.0
|
|
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
+
tags:
|
| 12 |
+
- agents
|
| 13 |
+
- evaluation
|
| 14 |
+
- software-engineering
|
| 15 |
+
- tool-use
|
| 16 |
+
models: []
|
| 17 |
+
datasets:
|
| 18 |
+
- mukunda1729/agent-eval-scenarios
|
| 19 |
+
- mukunda1729/premium-agent-repo-landscape
|
| 20 |
---
|
| 21 |
|
| 22 |
+
# Agent Eval Lab
|
| 23 |
+
|
| 24 |
+
Agent Eval Lab is a small public demo for turning rough agent workflows into practical evaluation scenarios.
|
| 25 |
+
|
| 26 |
+
It helps builders generate:
|
| 27 |
+
|
| 28 |
+
- a scenario title
|
| 29 |
+
- task setup
|
| 30 |
+
- expected behavior
|
| 31 |
+
- likely failure modes
|
| 32 |
+
- scoring dimensions
|
| 33 |
+
- next-step follow-up tests
|
| 34 |
+
|
| 35 |
+
The Space is intentionally lightweight and portfolio-friendly: fast to inspect, easy to extend, and aligned with public artifacts on Kaggle, Codeberg, and other AI platforms.
|