Spaces:
Running
Running
Commit ·
4e608c3
0
Parent(s):
Initialize GPUClusterEnv boilerplate as per OpenEnv requirements
Browse files- .github/workflows/sync_to_hub.yml +23 -0
- .gitignore +5 -0
- Dockerfile +11 -0
- README.md +86 -0
- demo.py +15 -0
- openenv.yaml +10 -0
- requirements.txt +9 -0
- src/__init__.py +1 -0
- src/baseline.py +24 -0
- src/env.py +57 -0
- src/environment.py +0 -0
- src/main.py +55 -0
- src/models.py +25 -0
- src/tasks.py +7 -0
.github/workflows/sync_to_hub.yml
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Sync to Hugging Face Hub
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
push:
|
| 5 |
+
branches: [main]
|
| 6 |
+
# Allows you to run this workflow manually from the Actions tab
|
| 7 |
+
workflow_dispatch:
|
| 8 |
+
|
| 9 |
+
jobs:
|
| 10 |
+
sync-to-hub:
|
| 11 |
+
runs-on: ubuntu-latest
|
| 12 |
+
steps:
|
| 13 |
+
- uses: actions/checkout@v3
|
| 14 |
+
with:
|
| 15 |
+
fetch-depth: 0
|
| 16 |
+
lfs: true
|
| 17 |
+
|
| 18 |
+
- name: Push to Hugging Face Hub
|
| 19 |
+
env:
|
| 20 |
+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
| 21 |
+
run: |
|
| 22 |
+
git remote add space https://hiitsesh:$HF_TOKEN@huggingface.co/spaces/hiitsesh/openenv-hackathon
|
| 23 |
+
git push --force space main
|
.gitignore
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
venv/
|
| 2 |
+
.env
|
| 3 |
+
.vscode/
|
| 4 |
+
__pycache__/
|
| 5 |
+
.git.old/
|
Dockerfile
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.10-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
COPY . /app
|
| 5 |
+
|
| 6 |
+
RUN pip install --no-cache-dir fastapi uvicorn pydantic numpy requests
|
| 7 |
+
|
| 8 |
+
# Expose port for HF Spaces
|
| 9 |
+
EXPOSE 7860
|
| 10 |
+
|
| 11 |
+
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "7860"]
|
README.md
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: GPUClusterEnv
|
| 3 |
+
emoji: 🚀
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: false
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# GPU Cluster Resource Management Environment (GPUClusterEnv)
|
| 11 |
+
|
| 12 |
+
A real-world cloud infrastructure environment where agents manage GPU provisioning to handle ML training workloads under strict budget constraints.
|
| 13 |
+
|
| 14 |
+
Managing compute resources for incoming ML training jobs requires balancing strict budgets against Service Level Agreement (SLA) penalties for long queue times. This environment challenges agents to dynamically scale GPU resources to match fluctuating job arrival rates while maximizing overall reward.
|
| 15 |
+
|
| 16 |
+
## 🚀 Getting Started
|
| 17 |
+
|
| 18 |
+
### Installation
|
| 19 |
+
|
| 20 |
+
1. Clone the repository:
|
| 21 |
+
```bash
|
| 22 |
+
git clone https://github.com/yourusername/GPUClusterEnv
|
| 23 |
+
cd GPUClusterEnv
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
2. Install dependencies:
|
| 27 |
+
```bash
|
| 28 |
+
pip install -r requirements.txt
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
3. Run the FastAPI server:
|
| 32 |
+
```bash
|
| 33 |
+
uvicorn src.main:app --host 0.0.0.0 --port 7860
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## 🧠 Environment Design
|
| 37 |
+
|
| 38 |
+
### Observation Space
|
| 39 |
+
|
| 40 |
+
The observation space is represented as a structured dictionary containing the current state of the GPU cluster:
|
| 41 |
+
|
| 42 |
+
| Feature | Description | Type |
|
| 43 |
+
| :--- | :--- | :--- |
|
| 44 |
+
| `time_step` | Current step in the episode. | `int` |
|
| 45 |
+
| `active_gpus` | Number of currently provisioned GPUs. | `int` |
|
| 46 |
+
| `queue_size` | Number of jobs waiting to be processed. | `int` |
|
| 47 |
+
| `current_budget` | Remaining budget for the episode. | `float` |
|
| 48 |
+
| `incoming_jobs` | Number of new jobs that arrived in the last step. | `int` |
|
| 49 |
+
|
| 50 |
+
### Action Space
|
| 51 |
+
|
| 52 |
+
The agent controls the scaling of the infrastructure by specifying how many GPUs to provision or de-provision:
|
| 53 |
+
|
| 54 |
+
| Feature | Description | Type | Notes |
|
| 55 |
+
| :--- | :--- | :--- | :--- |
|
| 56 |
+
| `gpus_to_provision` | Number of GPUs to spin up (positive) or spin down (negative). | `int` | Infrastructure scaling |
|
| 57 |
+
|
| 58 |
+
### Reward Function
|
| 59 |
+
|
| 60 |
+
Instead of a sparse reward, the environment uses a shaped reward function that continuously evaluates the agent's performance based on processing jobs while minimizing costs and SLA penalties:
|
| 61 |
+
|
| 62 |
+
$$Reward = (JobsProcessed \times 5.0) - (ActiveGPUs \times CostPerGPU) - (QueueSize \times Penalty)$$
|
| 63 |
+
|
| 64 |
+
* **CostPerGPU**: $2.50 per step per active GPU.
|
| 65 |
+
* **Penalty**: $1.00 SLA penalty per step for each waiting job in the queue.
|
| 66 |
+
|
| 67 |
+
### Terminal Conditions
|
| 68 |
+
|
| 69 |
+
An episode ends when:
|
| 70 |
+
1. The maximum number of `time_steps` for the task is reached.
|
| 71 |
+
2. The `current_budget` drops to $0 or below.
|
| 72 |
+
|
| 73 |
+
## 📋 Tasks
|
| 74 |
+
|
| 75 |
+
The environment provides 3 graded tasks with escalating difficulty:
|
| 76 |
+
|
| 77 |
+
1. **Easy** (`task_id: "easy"`): Low job arrival rate, generous budget. (Max Steps: 50)
|
| 78 |
+
2. **Medium** (`task_id: "medium"`): Moderate job arrival rate, standard budget. (Max Steps: 100)
|
| 79 |
+
3. **Hard** (`task_id: "hard"`): High, erratic job arrival rate, tight budget. (Max Steps: 200)
|
| 80 |
+
|
| 81 |
+
## 🤖 Baseline Agent
|
| 82 |
+
|
| 83 |
+
To evaluate the baseline agent performance:
|
| 84 |
+
```bash
|
| 85 |
+
python src/baseline.py
|
| 86 |
+
```
|
demo.py
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
|
| 3 |
+
def demo_function(name):
|
| 4 |
+
return f"Hello, {name if name else 'Developer'}! The OpenEnv Hackathon Demo is running successfully with the new updates!"
|
| 5 |
+
|
| 6 |
+
if __name__ == "__main__":
|
| 7 |
+
print("Launching Gradio demo...")
|
| 8 |
+
demo = gr.Interface(
|
| 9 |
+
fn=demo_function,
|
| 10 |
+
inputs=gr.Textbox(label="Enter your name", placeholder="Name..."),
|
| 11 |
+
outputs=gr.Textbox(label="Message"),
|
| 12 |
+
title="OpenEnv Hackathon Submission Demo (Updated v2 ✨)",
|
| 13 |
+
description="A demo for your Hugging Face Space. This version has been updated to confirm your recent changes are now live!"
|
| 14 |
+
)
|
| 15 |
+
demo.launch()
|
openenv.yaml
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: GPUClusterEnv
|
| 2 |
+
version: 1.0.0
|
| 3 |
+
description: A real-world cloud infrastructure environment where agents manage GPU provisioning to handle ML training workloads under strict budget constraints.
|
| 4 |
+
endpoints:
|
| 5 |
+
reset: /reset
|
| 6 |
+
step: /step
|
| 7 |
+
state: /state
|
| 8 |
+
baseline: /baseline
|
| 9 |
+
grader: /grader
|
| 10 |
+
tasks: /tasks
|
requirements.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio
|
| 2 |
+
torch
|
| 3 |
+
numpy
|
| 4 |
+
gymnasium
|
| 5 |
+
fastapi
|
| 6 |
+
uvicorn
|
| 7 |
+
pydantic
|
| 8 |
+
numpy
|
| 9 |
+
requests
|
src/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
# Environment implementation goes here
|
src/baseline.py
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import requests
|
| 2 |
+
|
| 3 |
+
BASE_URL = "http://localhost:7860" # Default HF space port
|
| 4 |
+
|
| 5 |
+
def evaluate_baseline(task_id):
|
| 6 |
+
requests.post(f"{BASE_URL}/reset?task_id={task_id}")
|
| 7 |
+
done = False
|
| 8 |
+
|
| 9 |
+
while not done:
|
| 10 |
+
state = requests.get(f"{BASE_URL}/state").json()["observation"]
|
| 11 |
+
|
| 12 |
+
# Simple policy: If queue is larger than active GPUs, provision more.
|
| 13 |
+
gpus_needed = state["queue_size"] - state["active_gpus"]
|
| 14 |
+
action = {"gpus_to_provision": max(-1, min(2, gpus_needed))} # Throttle scaling
|
| 15 |
+
|
| 16 |
+
step_res = requests.post(f"{BASE_URL}/step", json=action).json()
|
| 17 |
+
done = step_res["done"]
|
| 18 |
+
|
| 19 |
+
score = requests.get(f"{BASE_URL}/grader").json()["score"]
|
| 20 |
+
print(f"Task: {task_id} | Final Score: {score:.3f}")
|
| 21 |
+
|
| 22 |
+
if __name__ == "__main__":
|
| 23 |
+
for task in ["easy", "medium", "hard"]:
|
| 24 |
+
evaluate_baseline(task)
|
src/env.py
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import numpy as np
|
| 2 |
+
from src.models import Observation, Action, StepResult, TaskConfig
|
| 3 |
+
|
| 4 |
+
class GPUClusterEnv:
|
| 5 |
+
def __init__(self):
|
| 6 |
+
self.config = None
|
| 7 |
+
self.state = None
|
| 8 |
+
self.total_reward = 0.0
|
| 9 |
+
|
| 10 |
+
def reset(self, config: TaskConfig) -> Observation:
|
| 11 |
+
self.config = config
|
| 12 |
+
self.total_reward = 0.0
|
| 13 |
+
self.state = Observation(
|
| 14 |
+
time_step=0,
|
| 15 |
+
active_gpus=1,
|
| 16 |
+
queue_size=0,
|
| 17 |
+
current_budget=config.initial_budget,
|
| 18 |
+
incoming_jobs=0
|
| 19 |
+
)
|
| 20 |
+
return self.state
|
| 21 |
+
|
| 22 |
+
def step(self, action: Action) -> StepResult:
|
| 23 |
+
if self.state is None:
|
| 24 |
+
raise ValueError("Environment must be reset before calling step.")
|
| 25 |
+
|
| 26 |
+
# 1. Apply Action (Scale infrastructure)
|
| 27 |
+
self.state.active_gpus = max(0, self.state.active_gpus + action.gpus_to_provision)
|
| 28 |
+
|
| 29 |
+
# 2. Simulate incoming workloads
|
| 30 |
+
new_jobs = np.random.poisson(self.config.job_arrival_rate)
|
| 31 |
+
self.state.incoming_jobs = new_jobs
|
| 32 |
+
self.state.queue_size += new_jobs
|
| 33 |
+
|
| 34 |
+
# 3. Process jobs (1 GPU processes 1 job per step)
|
| 35 |
+
jobs_processed = min(self.state.active_gpus, self.state.queue_size)
|
| 36 |
+
self.state.queue_size -= jobs_processed
|
| 37 |
+
|
| 38 |
+
# 4. Calculate Costs & Rewards
|
| 39 |
+
gpu_cost = self.state.active_gpus * 2.5 # $2.50 per step per GPU
|
| 40 |
+
sla_penalty = self.state.queue_size * 1.0 # $1 penalty per waiting job
|
| 41 |
+
|
| 42 |
+
self.state.current_budget -= gpu_cost
|
| 43 |
+
|
| 44 |
+
# Reward shaping
|
| 45 |
+
reward = (jobs_processed * 5.0) - gpu_cost - sla_penalty
|
| 46 |
+
self.total_reward += reward
|
| 47 |
+
self.state.time_step += 1
|
| 48 |
+
|
| 49 |
+
# 5. Terminal Conditions
|
| 50 |
+
done = self.state.time_step >= self.config.max_steps or self.state.current_budget <= 0
|
| 51 |
+
|
| 52 |
+
return StepResult(
|
| 53 |
+
observation=self.state,
|
| 54 |
+
reward=reward,
|
| 55 |
+
done=done,
|
| 56 |
+
info={"jobs_processed": jobs_processed, "total_reward": self.total_reward}
|
| 57 |
+
)
|
src/environment.py
ADDED
|
File without changes
|
src/main.py
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from fastapi import FastAPI, HTTPException
|
| 2 |
+
from src.models import Action, TaskConfig
|
| 3 |
+
from src.env import GPUClusterEnv
|
| 4 |
+
from src.tasks import TASKS
|
| 5 |
+
import subprocess
|
| 6 |
+
|
| 7 |
+
app = FastAPI(title="GPU Cluster OpenEnv")
|
| 8 |
+
env = GPUClusterEnv()
|
| 9 |
+
|
| 10 |
+
@app.get("/")
|
| 11 |
+
def health_check():
|
| 12 |
+
return {"status": "ok", "message": "GPUClusterEnv is running"}
|
| 13 |
+
|
| 14 |
+
@app.post("/reset")
|
| 15 |
+
def reset_env(task_id: str = "easy"):
|
| 16 |
+
if task_id not in TASKS:
|
| 17 |
+
raise HTTPException(status_code=404, detail="Task not found")
|
| 18 |
+
obs = env.reset(TASKS[task_id])
|
| 19 |
+
return {"observation": obs.dict()}
|
| 20 |
+
|
| 21 |
+
@app.post("/step")
|
| 22 |
+
def step_env(action: Action):
|
| 23 |
+
try:
|
| 24 |
+
result = env.step(action)
|
| 25 |
+
return result.dict()
|
| 26 |
+
except Exception as e:
|
| 27 |
+
raise HTTPException(status_code=400, detail=str(e))
|
| 28 |
+
|
| 29 |
+
@app.get("/state")
|
| 30 |
+
def get_state():
|
| 31 |
+
if env.state is None:
|
| 32 |
+
raise HTTPException(status_code=400, detail="Environment not initialized")
|
| 33 |
+
return {"observation": env.state.dict()}
|
| 34 |
+
|
| 35 |
+
@app.get("/tasks")
|
| 36 |
+
def list_tasks():
|
| 37 |
+
return {
|
| 38 |
+
"tasks": list(TASKS.keys()),
|
| 39 |
+
"action_schema": Action.schema()
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
@app.get("/grader")
|
| 43 |
+
def grader():
|
| 44 |
+
# Normalizes total reward to a 0.0 - 1.0 score based on max possible baseline
|
| 45 |
+
if env.state is None:
|
| 46 |
+
return {"score": 0.0}
|
| 47 |
+
max_expected_reward = env.config.max_steps * 10 # Arbitrary max for example
|
| 48 |
+
score = max(0.0, min(1.0, env.total_reward / max_expected_reward))
|
| 49 |
+
return {"score": score}
|
| 50 |
+
|
| 51 |
+
@app.post("/baseline")
|
| 52 |
+
def run_baseline():
|
| 53 |
+
# Trigger the baseline script and return results
|
| 54 |
+
result = subprocess.run(["python", "src/baseline.py"], capture_output=True, text=True)
|
| 55 |
+
return {"output": result.stdout}
|
src/models.py
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pydantic import BaseModel
|
| 2 |
+
from typing import List, Dict
|
| 3 |
+
|
| 4 |
+
class Observation(BaseModel):
|
| 5 |
+
time_step: int
|
| 6 |
+
active_gpus: int
|
| 7 |
+
queue_size: int
|
| 8 |
+
current_budget: float
|
| 9 |
+
incoming_jobs: int
|
| 10 |
+
|
| 11 |
+
class Action(BaseModel):
|
| 12 |
+
gpus_to_provision: int # Can be positive (spin up) or negative (spin down)
|
| 13 |
+
|
| 14 |
+
class StepResult(BaseModel):
|
| 15 |
+
observation: Observation
|
| 16 |
+
reward: float
|
| 17 |
+
done: bool
|
| 18 |
+
info: Dict
|
| 19 |
+
|
| 20 |
+
class TaskConfig(BaseModel):
|
| 21 |
+
task_id: str
|
| 22 |
+
difficulty: str
|
| 23 |
+
max_steps: int
|
| 24 |
+
initial_budget: float
|
| 25 |
+
job_arrival_rate: float # Lambda for Poisson distribution
|
src/tasks.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from src.models import TaskConfig
|
| 2 |
+
|
| 3 |
+
TASKS = {
|
| 4 |
+
"easy": TaskConfig(task_id="easy", difficulty="easy", max_steps=50, initial_budget=1000.0, job_arrival_rate=2.0),
|
| 5 |
+
"medium": TaskConfig(task_id="medium", difficulty="medium", max_steps=100, initial_budget=800.0, job_arrival_rate=5.0),
|
| 6 |
+
"hard": TaskConfig(task_id="hard", difficulty="hard", max_steps=200, initial_budget=500.0, job_arrival_rate=12.0),
|
| 7 |
+
}
|