--- title: "PrefixGuard Demo - Agent Failure Detection" emoji: 🛡️ colorFrom: blue colorTo: red sdk: gradio sdk_version: 4.36.0 app_file: app.py pinned: false --- # PrefixGuard Demo A minimal implementation demonstrating the core concept from "PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors" (Huang et al., 2026). ## What it demonstrates PrefixGuard shows that we can predict agent task failures from early execution traces, not just final outcomes. This demo implements: - **Trace encoding**: Convert agent step sequences to feature vectors - **Prefix risk scoring**: Score risk from partial trace prefixes - **Early warning**: Alert before task completion when failure is likely ## Hypothesis Agent execution traces contain early signals of eventual failure. A lightweight prefix-based monitor can predict failures with >0.7 AUPRC, enabling intervention before resources are wasted. ## Key findings from paper - Best monitors achieve 0.900/0.710/0.533/0.557 AUPRC across WebArena, τ²-Bench, SkillsBench, TerminalBench - +0.137 AUPRC improvement over raw-text baselines - LLM judges are substantially weaker at prefix-time prediction ## Implementation This demo uses synthetic agent traces to illustrate the approach. In production, this would integrate with actual agent frameworks (LangGraph, AutoGPT, etc.).