Spaces:

mukunda1729
/

ops-scorecard-lab

Running

App Files Files Community

ops-scorecard-lab / README.md

mukunda1729

Update README.md

f9354ff verified 17 days ago

preview code

raw

history blame contribute delete

1.47 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: Ops Scorecard Lab
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - agents
  - operations
  - evaluation
  - scorecards
models: []
datasets:
  - mukunda1729/agent-eval-scenarios

Ops Scorecard Lab

Ops Scorecard Lab turns a rough agent workflow into an operator-facing scorecard.

It helps builders outline:

the operating surface
review priority
verification expectations
rollout notes
next action items

The Space is intentionally lightweight and portfolio-friendly: fast to inspect, easy to extend, and aligned with the public eval dataset on Hugging Face and Kaggle.

Associated Papers

Primary paper: Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents
Paper landing page: lightweight-agent-eval-paper
Companion evaluation harness paper: AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows
Artifact repo: MukundaKatta/ai-eval-forge-paper

Related Public Artifacts

Hugging Face dataset: mukunda1729/agent-eval-scenarios
Hugging Face collection: Agent Labs Portfolio