Spaces:

mukunda1729
/

agent-eval-lab

Sleeping

App Files Files Community

agent-eval-lab / README.md

mukunda1729

Update README.md

a062fbd verified 15 days ago

preview code

raw

history blame contribute delete

1.74 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: Agent Eval Lab
emoji: 🧪
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - agents
  - evaluation
  - software-engineering
  - tool-use
models: []
datasets:
  - mukunda1729/agent-eval-scenarios
  - mukunda1729/premium-agent-repo-landscape

Agent Eval Lab

Agent Eval Lab is a small public demo for turning rough agent workflows into practical evaluation scenarios.

It helps builders generate:

a scenario title
task setup
expected behavior
likely failure modes
scoring dimensions
next-step follow-up tests

The Space is intentionally lightweight and portfolio-friendly: fast to inspect, easy to extend, and aligned with public artifacts on Kaggle, Codeberg, and other AI platforms.

Associated Papers

Primary paper: Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents
Paper landing page: lightweight-agent-eval-paper
Artifact repo: MukundaKatta/lightweight-agent-eval-paper
Companion evaluation harness paper: AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows

Related Public Artifacts

Hugging Face dataset: mukunda1729/agent-eval-scenarios
Hugging Face dataset: mukunda1729/premium-agent-repo-landscape
Hugging Face collection: Agent Labs Portfolio