Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.14.0
metadata
title: Ops Scorecard Lab
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
- agents
- operations
- evaluation
- scorecards
models: []
datasets:
- mukunda1729/agent-eval-scenarios
Ops Scorecard Lab
Ops Scorecard Lab turns a rough agent workflow into an operator-facing scorecard.
It helps builders outline:
- the operating surface
- review priority
- verification expectations
- rollout notes
- next action items
The Space is intentionally lightweight and portfolio-friendly: fast to inspect, easy to extend, and aligned with the public eval dataset on Hugging Face and Kaggle.
Associated Papers
- Primary paper: Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents
- Paper landing page: lightweight-agent-eval-paper
- Companion evaluation harness paper: AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows
- Artifact repo: MukundaKatta/ai-eval-forge-paper
Related Public Artifacts
- Hugging Face dataset: mukunda1729/agent-eval-scenarios
- Hugging Face collection: Agent Labs Portfolio