Spaces:
Running
Running
| title: "PrefixGuard Demo - Agent Failure Detection" | |
| emoji: 🛡️ | |
| colorFrom: blue | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 4.36.0 | |
| app_file: app.py | |
| pinned: false | |
| # PrefixGuard Demo | |
| A minimal implementation demonstrating the core concept from "PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors" (Huang et al., 2026). | |
| ## What it demonstrates | |
| PrefixGuard shows that we can predict agent task failures from early execution traces, not just final outcomes. This demo implements: | |
| - **Trace encoding**: Convert agent step sequences to feature vectors | |
| - **Prefix risk scoring**: Score risk from partial trace prefixes | |
| - **Early warning**: Alert before task completion when failure is likely | |
| ## Hypothesis | |
| Agent execution traces contain early signals of eventual failure. A lightweight prefix-based monitor can predict failures with >0.7 AUPRC, enabling intervention before resources are wasted. | |
| ## Key findings from paper | |
| - Best monitors achieve 0.900/0.710/0.533/0.557 AUPRC across WebArena, τ²-Bench, SkillsBench, TerminalBench | |
| - +0.137 AUPRC improvement over raw-text baselines | |
| - LLM judges are substantially weaker at prefix-time prediction | |
| ## Implementation | |
| This demo uses synthetic agent traces to illustrate the approach. In production, this would integrate with actual agent frameworks (LangGraph, AutoGPT, etc.). | |