Spaces:

mukunda1729
/

agent-eval-lab

Sleeping

App Files Files Community

agent-eval-lab / README.md

mukunda1729

Update README.md

a062fbd verified 15 days ago

preview code

raw

history blame contribute delete

1.74 kB

	---
	title: Agent Eval Lab
	emoji: "🧪"
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.29.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	tags:
	- agents
	- evaluation
	- software-engineering
	- tool-use
	models: []
	datasets:
	- mukunda1729/agent-eval-scenarios
	- mukunda1729/premium-agent-repo-landscape
	---

	# Agent Eval Lab

	Agent Eval Lab is a small public demo for turning rough agent workflows into practical evaluation scenarios.

	It helps builders generate:

	- a scenario title
	- task setup
	- expected behavior
	- likely failure modes
	- scoring dimensions
	- next-step follow-up tests

	The Space is intentionally lightweight and portfolio-friendly: fast to inspect, easy to extend, and aligned with public artifacts on Kaggle, Codeberg, and other AI platforms.

	## Associated Papers

	- Primary paper: [Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents](https://doi.org/10.5281/zenodo.20034550)
	- Paper landing page: [lightweight-agent-eval-paper](https://mukundakatta.github.io/lightweight-agent-eval-paper/)
	- Artifact repo: [MukundaKatta/lightweight-agent-eval-paper](https://github.com/MukundaKatta/lightweight-agent-eval-paper)
	- Companion evaluation harness paper: [AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows](https://doi.org/10.5281/zenodo.20044318)

	## Related Public Artifacts

	- Hugging Face dataset: [mukunda1729/agent-eval-scenarios](https://huggingface.co/datasets/mukunda1729/agent-eval-scenarios)
	- Hugging Face dataset: [mukunda1729/premium-agent-repo-landscape](https://huggingface.co/datasets/mukunda1729/premium-agent-repo-landscape)
	- Hugging Face collection: [Agent Labs Portfolio](https://huggingface.co/collections/mukunda1729/agent-labs-portfolio)