# Universal SENTINEL Oversight Harness ## Idea SENTINEL should be able to supervise not only this repo's worker fleet, but any OpenEnv-compatible environment or agent exposed through a Hugging Face Space. The demo story: > Paste an OpenEnv Space URL. SENTINEL wraps the environment, intercepts proposed > actions, blocks or redirects risky behavior, then produces a cross-team > oversight report. This should be built last, after the core SENTINEL training proof is stable. ## Why It Wins Demo Attention Most submissions show one agent solving one task. Universal SENTINEL shows a meta-agent supervising many agents across many task domains. This turns SENTINEL from "an SRE oversight environment" into a general pattern: - calendar agents: block bulk destructive calendar edits without confirmation - SRE agents: block hallucinated services and premature severity claims - IoT agents: block actuator changes without sensor evidence - negotiation agents: flag loop exploitation and repeated offers - finance agents: block irreversible actions without risk evidence ## Safe Framing Do not say "we judged all other teams." That sounds adversarial and may depend on private agent APIs. Say: > SENTINEL is an opt-in universal oversight harness for OpenEnv-compatible > agents and environments. ## Three Integration Levels ### Level 1: OpenEnv Environment Wrapper Works when the Space exposes: - `GET /health` - `GET /tasks` - `POST /reset` - `POST /step` - `POST /grader` Flow: ```text Remote OpenEnv Space -> observation -> local baseline/LLM action generator -> SENTINEL intercept -> approved/blocked/redirected action -> remote /step -> oversight report ``` This is the safest live demo path. ### Level 2: External Agent Oversight Works when another team exposes an agent endpoint, for example: - `POST /act` - input: observation - output: proposed action Flow: ```text Remote env observation -> remote agent /act -> proposed action -> SENTINEL intercept -> remote env /step if approved ``` This is the strongest proof, but requires cooperation or public agent APIs. ### Level 3: Browser/UI Scraping Avoid for the judged demo. It is fragile, slow, and can look like we are attacking other submissions. Keep the product-level version API-first. ## Proposed Files ```text universal/ adapters.py # OpenEnv/HF Space compatibility checks action_generator.py # baseline or model action proposal provider harness.py # run multiple Spaces with retry/backoff/circuit breaker policy_mapper.py # map domain actions to universal safety categories report.py # aggregate cross-environment oversight report ``` ## Universal Safety Categories SENTINEL should normalize arbitrary environment actions into these categories: - invalid target or hallucinated entity - irreversible/destructive action - external communication - escalation or broad notification - actuator/control action - repeated loop action - action before evidence - cross-domain authority violation ## Example Report ```text SENTINEL OVERSIGHT REPORT - Grand Finale 2026 Sources monitored: 8 Compatible OpenEnv spaces: 6 Unavailable or timed out: 2 Team A - Calendar Assistant Caught: escalation bombing, step 4 Redirected: destructive bulk update -> request confirmation, step 7 Risk prevented: 3.2 Team B - SRE Environment Caught: hallucination - service "auth-proxy-v2" does not exist, step 2 Caught: reward hacking - classified P1 before investigation, step 1 Team C - IoT Environment All actions approved - clean agent Team D - Negotiation Caught: loop exploitation - same offer repeated 4 times Totals: Actions audited: 82 Blocks: 14 Redirects: 6 Flags: 9 Prevented risk: 11.7 ``` ## Reliability Requirements The harness must never depend on a remote Space being healthy. Required protections: - 5-10 second request timeout per remote call - exponential backoff for transient failures - per-Space circuit breaker after repeated failures - compatibility report when `/tasks` or schemas are missing - offline fixture mode for the live pitch - no false precision for unknown labels For unknown external environments, say "estimated false positives" unless the remote Space provides labels or grader feedback. ## Build Order 1. Keep this as a finale extension until core training proof is complete. 2. Implement OpenEnv compatibility checker. 3. Implement one local action generator. 4. Run 3-5 known Spaces or local fixtures. 5. Add aggregate report generation. 6. Add paste-a-Space-URL field to `/sentinel/dashboard`. 7. Only then attempt external agent `/act` integration. ## Demo Principle Prepared mode must always work. Bring-your-own-link mode is a bonus. The judged demo should show: 1. SENTINEL core environment. 2. Reward curve / before-after training proof. 3. Zero-shot confidence washing via `/sentinel/intercept`. 4. Universal oversight report as the final "this scales beyond our environment" moment.