Spaces:
Sleeping
title: CloudOps Optimizer
emoji: ☁️
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
🚀 Project Overview
CloudOps Optimizer is an OpenEnv simulation for Autonomous FinOps. It challenges AI agents to balance cloud infrastructure costs against performance SLAs, simulating real-world SRE tasks.
The Problem It Simulates
Companies using AWS/Azure/GCP waste millions yearly on:
- Oversized servers - paying for capacity they don't need
- Undersized servers - causing performance issues
- Poor resource allocation - balancing cost vs performance
The Agent's Job
- See current infrastructure (CPU usage, costs, latency)
- Choose actions like
change srv-1 to t3.small - Get rewards/penalties based on cost savings + performance
- Learn to optimize cost vs performance tradeoffs
CloudOps Optimizer Environment
Overview
CloudOps Optimizer is a real-world simulation of cloud infrastructure cost and performance optimization. The agent acts as a Cloud Site Reliability Engineer (SRE) optimizing a fleet of virtual cloud instances to meet Service Level Agreement (SLA) requirements while minimizing monthly costs.
Why This Matters
- Real-world utility: Every company using AWS/Azure/GCP struggles with "Cloud Waste". Training agents to right-size instances is a multi-million dollar problem.
- Not a toy: Unlike chatbots or simple games, this environment requires quantitative reasoning about cost vs performance tradeoffs.
Environment Description
Observation Space
The agent receives structured data including:
- Inventory: List of cloud resources (id, type, cpu_usage, mem_usage, monthly_cost)
- Metrics: Real-time performance (avg_latency_ms, error_rate, throughput_rps)
- SLA: Target constraints (max_latency_ms, max_budget, min_uptime_pct)
- Task Info: task_id, task_name, difficulty, current step
Action Space
The agent sends text commands in format: change [resource_id] to [instance_type]
Available instance types:
t3.nano: $3.60/mo, capacity 1.0t3.small: $11.50/mo, capacity 2.0t3.medium: $23.00/mo, capacity 4.0m5.large: $70.00/mo, capacity 8.0m5.xlarge: $140.00/mo, capacity 16.0
Tasks & Grading
| Task | Difficulty | Description | Grading |
|---|---|---|---|
| Right-Sizing | Easy | Reduce an overpriced server without breaking SLA | Score = reward value (0-1) |
| Latency Fix | Medium | Resolve performance bottleneck under budget | Score = reward value (0-1) |
| Balance Optimization | Hard | Optimize multi-server cluster with tight constraints | Score = reward value (0-1) |
Reward Function
The reward provides continuous signals over the trajectory:
R = cost_reward + performance_reward
Where:
- Cost Reward (0-0.5): Higher as cost approaches budget
- Performance Reward (0-0.5): Higher as latency stays under SLA
Partial Progress: Agent receives incremental rewards for each improvement. Penalties: System crash (CPU > 110%) results in 0 reward and episode end.
Setup & Usage
Prerequisites
- Python 3.10+
- OpenAI API key (HF_TOKEN)
Local Installation
# Install dependencies
pip install -e .
# Run baseline inference
export HF_TOKEN=your_huggingface_token
python inference.py
Docker Execution
docker build -t cloud-ops-env .
docker run -p 8000:8000 cloud-ops-env
API Endpoints
POST /reset- Reset environment with optional task_idPOST /step- Execute actionGET /state- Get current stateGET /health- Health check
Baseline Results
Model: Qwen/Qwen2.5-72B-Instruct
| Task | Score | Steps |
|---|---|---|
| Right-Sizing (Easy) | 0.125 | 1 |
| Latency Fix (Medium) | 0.000 | 1 |
| Balance (Hard) | 0.000 | 1 |
Average: 0.042
Note: Baseline scores indicate the model needs better prompting to handle the optimization tradeoffs. The environment correctly penalizes overshooting budget (easy) and undersizing (medium/hard causing crashes).
Files
openenv.yaml- OpenEnv specificationmodels.py- Pydantic models (Observation, Action, Reward)env/core.py- Environment logic with state machineserver/app.py- FastAPI serverinference.py- Baseline inference scriptDockerfile- Container build
Spec Compliance
- Typed Pydantic models
- reset() returns Observation
- step(action) returns (Observation, Reward, done, info)
- state() returns current state
- openenv.yaml with metadata
- openenv validate passes
- 3 tasks with deterministic graders (0.0-1.0)
- Partial reward signals
- Strict [START]/[STEP]/[END] log format in inference.py