File size: 1,743 Bytes
6b25a4b
 
551205c
 
 
6b25a4b
551205c
6b25a4b
 
 
551205c
 
 
 
 
 
 
 
 
6b25a4b
 
551205c
 
 
 
 
 
 
 
 
 
 
 
 
 
a062fbd
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
title: Agent Eval Lab
emoji: "🧪"
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
- agents
- evaluation
- software-engineering
- tool-use
models: []
datasets:
- mukunda1729/agent-eval-scenarios
- mukunda1729/premium-agent-repo-landscape
---

# Agent Eval Lab

Agent Eval Lab is a small public demo for turning rough agent workflows into practical evaluation scenarios.

It helps builders generate:

- a scenario title
- task setup
- expected behavior
- likely failure modes
- scoring dimensions
- next-step follow-up tests

The Space is intentionally lightweight and portfolio-friendly: fast to inspect, easy to extend, and aligned with public artifacts on Kaggle, Codeberg, and other AI platforms.

## Associated Papers

- Primary paper: [Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents](https://doi.org/10.5281/zenodo.20034550)
- Paper landing page: [lightweight-agent-eval-paper](https://mukundakatta.github.io/lightweight-agent-eval-paper/)
- Artifact repo: [MukundaKatta/lightweight-agent-eval-paper](https://github.com/MukundaKatta/lightweight-agent-eval-paper)
- Companion evaluation harness paper: [AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows](https://doi.org/10.5281/zenodo.20044318)

## Related Public Artifacts

- Hugging Face dataset: [mukunda1729/agent-eval-scenarios](https://huggingface.co/datasets/mukunda1729/agent-eval-scenarios)
- Hugging Face dataset: [mukunda1729/premium-agent-repo-landscape](https://huggingface.co/datasets/mukunda1729/premium-agent-repo-landscape)
- Hugging Face collection: [Agent Labs Portfolio](https://huggingface.co/collections/mukunda1729/agent-labs-portfolio)