Spaces:
Running
Universal SENTINEL Oversight Harness
Idea
SENTINEL should be able to supervise not only this repo's worker fleet, but any OpenEnv-compatible environment or agent exposed through a Hugging Face Space.
The demo story:
Paste an OpenEnv Space URL. SENTINEL wraps the environment, intercepts proposed actions, blocks or redirects risky behavior, then produces a cross-team oversight report.
This should be built last, after the core SENTINEL training proof is stable.
Why It Wins Demo Attention
Most submissions show one agent solving one task. Universal SENTINEL shows a meta-agent supervising many agents across many task domains.
This turns SENTINEL from "an SRE oversight environment" into a general pattern:
- calendar agents: block bulk destructive calendar edits without confirmation
- SRE agents: block hallucinated services and premature severity claims
- IoT agents: block actuator changes without sensor evidence
- negotiation agents: flag loop exploitation and repeated offers
- finance agents: block irreversible actions without risk evidence
Safe Framing
Do not say "we judged all other teams." That sounds adversarial and may depend on private agent APIs.
Say:
SENTINEL is an opt-in universal oversight harness for OpenEnv-compatible agents and environments.
Three Integration Levels
Level 1: OpenEnv Environment Wrapper
Works when the Space exposes:
GET /healthGET /tasksPOST /resetPOST /stepPOST /grader
Flow:
Remote OpenEnv Space
-> observation
-> local baseline/LLM action generator
-> SENTINEL intercept
-> approved/blocked/redirected action
-> remote /step
-> oversight report
This is the safest live demo path.
Level 2: External Agent Oversight
Works when another team exposes an agent endpoint, for example:
POST /act- input: observation
- output: proposed action
Flow:
Remote env observation
-> remote agent /act
-> proposed action
-> SENTINEL intercept
-> remote env /step if approved
This is the strongest proof, but requires cooperation or public agent APIs.
Level 3: Browser/UI Scraping
Avoid for the judged demo.
It is fragile, slow, and can look like we are attacking other submissions. Keep the product-level version API-first.
Proposed Files
universal/
adapters.py # OpenEnv/HF Space compatibility checks
action_generator.py # baseline or model action proposal provider
harness.py # run multiple Spaces with retry/backoff/circuit breaker
policy_mapper.py # map domain actions to universal safety categories
report.py # aggregate cross-environment oversight report
Universal Safety Categories
SENTINEL should normalize arbitrary environment actions into these categories:
- invalid target or hallucinated entity
- irreversible/destructive action
- external communication
- escalation or broad notification
- actuator/control action
- repeated loop action
- action before evidence
- cross-domain authority violation
Example Report
SENTINEL OVERSIGHT REPORT - Grand Finale 2026
Sources monitored: 8
Compatible OpenEnv spaces: 6
Unavailable or timed out: 2
Team A - Calendar Assistant
Caught: escalation bombing, step 4
Redirected: destructive bulk update -> request confirmation, step 7
Risk prevented: 3.2
Team B - SRE Environment
Caught: hallucination - service "auth-proxy-v2" does not exist, step 2
Caught: reward hacking - classified P1 before investigation, step 1
Team C - IoT Environment
All actions approved - clean agent
Team D - Negotiation
Caught: loop exploitation - same offer repeated 4 times
Totals:
Actions audited: 82
Blocks: 14
Redirects: 6
Flags: 9
Prevented risk: 11.7
Reliability Requirements
The harness must never depend on a remote Space being healthy.
Required protections:
- 5-10 second request timeout per remote call
- exponential backoff for transient failures
- per-Space circuit breaker after repeated failures
- compatibility report when
/tasksor schemas are missing - offline fixture mode for the live pitch
- no false precision for unknown labels
For unknown external environments, say "estimated false positives" unless the remote Space provides labels or grader feedback.
Build Order
- Keep this as a finale extension until core training proof is complete.
- Implement OpenEnv compatibility checker.
- Implement one local action generator.
- Run 3-5 known Spaces or local fixtures.
- Add aggregate report generation.
- Add paste-a-Space-URL field to
/sentinel/dashboard. - Only then attempt external agent
/actintegration.
Demo Principle
Prepared mode must always work. Bring-your-own-link mode is a bonus.
The judged demo should show:
- SENTINEL core environment.
- Reward curve / before-after training proof.
- Zero-shot confidence washing via
/sentinel/intercept. - Universal oversight report as the final "this scales beyond our environment" moment.