--- title: Agent Eval Lab emoji: "🧪" colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 5.29.0 app_file: app.py pinned: false license: apache-2.0 tags: - agents - evaluation - software-engineering - tool-use models: [] datasets: - mukunda1729/agent-eval-scenarios - mukunda1729/premium-agent-repo-landscape --- # Agent Eval Lab Agent Eval Lab is a small public demo for turning rough agent workflows into practical evaluation scenarios. It helps builders generate: - a scenario title - task setup - expected behavior - likely failure modes - scoring dimensions - next-step follow-up tests The Space is intentionally lightweight and portfolio-friendly: fast to inspect, easy to extend, and aligned with public artifacts on Kaggle, Codeberg, and other AI platforms. ## Associated Papers - Primary paper: [Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents](https://doi.org/10.5281/zenodo.20034550) - Paper landing page: [lightweight-agent-eval-paper](https://mukundakatta.github.io/lightweight-agent-eval-paper/) - Artifact repo: [MukundaKatta/lightweight-agent-eval-paper](https://github.com/MukundaKatta/lightweight-agent-eval-paper) - Companion evaluation harness paper: [AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows](https://doi.org/10.5281/zenodo.20044318) ## Related Public Artifacts - Hugging Face dataset: [mukunda1729/agent-eval-scenarios](https://huggingface.co/datasets/mukunda1729/agent-eval-scenarios) - Hugging Face dataset: [mukunda1729/premium-agent-repo-landscape](https://huggingface.co/datasets/mukunda1729/premium-agent-repo-landscape) - Hugging Face collection: [Agent Labs Portfolio](https://huggingface.co/collections/mukunda1729/agent-labs-portfolio)