Spaces:
Running
Running
| name: logtriage-env | |
| version: 1.0.0 | |
| description: > | |
| An OpenEnv environment where an AI agent acts as an on-call SRE. | |
| The agent receives live system logs from a simulated microservice cluster | |
| and must diagnose, prioritize, and resolve incidents across 3 tasks | |
| of increasing difficulty. | |
| author: Rohit Patil | |
| space_url: https://ogrohit-logtriage-env.hf.space | |
| tags: | |
| - openenv | |
| - sre | |
| - log-analysis | |
| - incident-response | |
| - reinforcement-learning | |
| tasks: | |
| - id: single_crash | |
| name: Single Service Crash | |
| difficulty: easy | |
| max_steps: 8 | |
| description: One service crashes with clear error logs. Classify, identify root cause, remediate. | |
| - id: cascading_failure | |
| name: Cascading Failure | |
| difficulty: medium | |
| max_steps: 12 | |
| description: Database slowdown causes upstream cascade. Find root cause, not just symptoms. | |
| - id: silent_degradation | |
| name: Silent Degradation with Noise | |
| difficulty: hard | |
| max_steps: 15 | |
| description: Slow degradation hidden in 60% noise. Nuanced severity judgment required. | |
| action_space: | |
| type: discrete | |
| description: SRE triage actions — classify, identify, escalate, remediate, resolve | |
| observation_space: | |
| type: structured | |
| description: Log batches + system state + incident metadata per step | |
| reward_range: [-0.5, 1.0] | |