hiitsesh commited on
Commit
4e608c3
·
0 Parent(s):

Initialize GPUClusterEnv boilerplate as per OpenEnv requirements

Browse files
.github/workflows/sync_to_hub.yml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face Hub
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ # Allows you to run this workflow manually from the Actions tab
7
+ workflow_dispatch:
8
+
9
+ jobs:
10
+ sync-to-hub:
11
+ runs-on: ubuntu-latest
12
+ steps:
13
+ - uses: actions/checkout@v3
14
+ with:
15
+ fetch-depth: 0
16
+ lfs: true
17
+
18
+ - name: Push to Hugging Face Hub
19
+ env:
20
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
21
+ run: |
22
+ git remote add space https://hiitsesh:$HF_TOKEN@huggingface.co/spaces/hiitsesh/openenv-hackathon
23
+ git push --force space main
.gitignore ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ venv/
2
+ .env
3
+ .vscode/
4
+ __pycache__/
5
+ .git.old/
Dockerfile ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+ COPY . /app
5
+
6
+ RUN pip install --no-cache-dir fastapi uvicorn pydantic numpy requests
7
+
8
+ # Expose port for HF Spaces
9
+ EXPOSE 7860
10
+
11
+ CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: GPUClusterEnv
3
+ emoji: 🚀
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ pinned: false
8
+ ---
9
+
10
+ # GPU Cluster Resource Management Environment (GPUClusterEnv)
11
+
12
+ A real-world cloud infrastructure environment where agents manage GPU provisioning to handle ML training workloads under strict budget constraints.
13
+
14
+ Managing compute resources for incoming ML training jobs requires balancing strict budgets against Service Level Agreement (SLA) penalties for long queue times. This environment challenges agents to dynamically scale GPU resources to match fluctuating job arrival rates while maximizing overall reward.
15
+
16
+ ## 🚀 Getting Started
17
+
18
+ ### Installation
19
+
20
+ 1. Clone the repository:
21
+ ```bash
22
+ git clone https://github.com/yourusername/GPUClusterEnv
23
+ cd GPUClusterEnv
24
+ ```
25
+
26
+ 2. Install dependencies:
27
+ ```bash
28
+ pip install -r requirements.txt
29
+ ```
30
+
31
+ 3. Run the FastAPI server:
32
+ ```bash
33
+ uvicorn src.main:app --host 0.0.0.0 --port 7860
34
+ ```
35
+
36
+ ## 🧠 Environment Design
37
+
38
+ ### Observation Space
39
+
40
+ The observation space is represented as a structured dictionary containing the current state of the GPU cluster:
41
+
42
+ | Feature | Description | Type |
43
+ | :--- | :--- | :--- |
44
+ | `time_step` | Current step in the episode. | `int` |
45
+ | `active_gpus` | Number of currently provisioned GPUs. | `int` |
46
+ | `queue_size` | Number of jobs waiting to be processed. | `int` |
47
+ | `current_budget` | Remaining budget for the episode. | `float` |
48
+ | `incoming_jobs` | Number of new jobs that arrived in the last step. | `int` |
49
+
50
+ ### Action Space
51
+
52
+ The agent controls the scaling of the infrastructure by specifying how many GPUs to provision or de-provision:
53
+
54
+ | Feature | Description | Type | Notes |
55
+ | :--- | :--- | :--- | :--- |
56
+ | `gpus_to_provision` | Number of GPUs to spin up (positive) or spin down (negative). | `int` | Infrastructure scaling |
57
+
58
+ ### Reward Function
59
+
60
+ Instead of a sparse reward, the environment uses a shaped reward function that continuously evaluates the agent's performance based on processing jobs while minimizing costs and SLA penalties:
61
+
62
+ $$Reward = (JobsProcessed \times 5.0) - (ActiveGPUs \times CostPerGPU) - (QueueSize \times Penalty)$$
63
+
64
+ * **CostPerGPU**: $2.50 per step per active GPU.
65
+ * **Penalty**: $1.00 SLA penalty per step for each waiting job in the queue.
66
+
67
+ ### Terminal Conditions
68
+
69
+ An episode ends when:
70
+ 1. The maximum number of `time_steps` for the task is reached.
71
+ 2. The `current_budget` drops to $0 or below.
72
+
73
+ ## 📋 Tasks
74
+
75
+ The environment provides 3 graded tasks with escalating difficulty:
76
+
77
+ 1. **Easy** (`task_id: "easy"`): Low job arrival rate, generous budget. (Max Steps: 50)
78
+ 2. **Medium** (`task_id: "medium"`): Moderate job arrival rate, standard budget. (Max Steps: 100)
79
+ 3. **Hard** (`task_id: "hard"`): High, erratic job arrival rate, tight budget. (Max Steps: 200)
80
+
81
+ ## 🤖 Baseline Agent
82
+
83
+ To evaluate the baseline agent performance:
84
+ ```bash
85
+ python src/baseline.py
86
+ ```
demo.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+
3
+ def demo_function(name):
4
+ return f"Hello, {name if name else 'Developer'}! The OpenEnv Hackathon Demo is running successfully with the new updates!"
5
+
6
+ if __name__ == "__main__":
7
+ print("Launching Gradio demo...")
8
+ demo = gr.Interface(
9
+ fn=demo_function,
10
+ inputs=gr.Textbox(label="Enter your name", placeholder="Name..."),
11
+ outputs=gr.Textbox(label="Message"),
12
+ title="OpenEnv Hackathon Submission Demo (Updated v2 ✨)",
13
+ description="A demo for your Hugging Face Space. This version has been updated to confirm your recent changes are now live!"
14
+ )
15
+ demo.launch()
openenv.yaml ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ name: GPUClusterEnv
2
+ version: 1.0.0
3
+ description: A real-world cloud infrastructure environment where agents manage GPU provisioning to handle ML training workloads under strict budget constraints.
4
+ endpoints:
5
+ reset: /reset
6
+ step: /step
7
+ state: /state
8
+ baseline: /baseline
9
+ grader: /grader
10
+ tasks: /tasks
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ gradio
2
+ torch
3
+ numpy
4
+ gymnasium
5
+ fastapi
6
+ uvicorn
7
+ pydantic
8
+ numpy
9
+ requests
src/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Environment implementation goes here
src/baseline.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import requests
2
+
3
+ BASE_URL = "http://localhost:7860" # Default HF space port
4
+
5
+ def evaluate_baseline(task_id):
6
+ requests.post(f"{BASE_URL}/reset?task_id={task_id}")
7
+ done = False
8
+
9
+ while not done:
10
+ state = requests.get(f"{BASE_URL}/state").json()["observation"]
11
+
12
+ # Simple policy: If queue is larger than active GPUs, provision more.
13
+ gpus_needed = state["queue_size"] - state["active_gpus"]
14
+ action = {"gpus_to_provision": max(-1, min(2, gpus_needed))} # Throttle scaling
15
+
16
+ step_res = requests.post(f"{BASE_URL}/step", json=action).json()
17
+ done = step_res["done"]
18
+
19
+ score = requests.get(f"{BASE_URL}/grader").json()["score"]
20
+ print(f"Task: {task_id} | Final Score: {score:.3f}")
21
+
22
+ if __name__ == "__main__":
23
+ for task in ["easy", "medium", "hard"]:
24
+ evaluate_baseline(task)
src/env.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ from src.models import Observation, Action, StepResult, TaskConfig
3
+
4
+ class GPUClusterEnv:
5
+ def __init__(self):
6
+ self.config = None
7
+ self.state = None
8
+ self.total_reward = 0.0
9
+
10
+ def reset(self, config: TaskConfig) -> Observation:
11
+ self.config = config
12
+ self.total_reward = 0.0
13
+ self.state = Observation(
14
+ time_step=0,
15
+ active_gpus=1,
16
+ queue_size=0,
17
+ current_budget=config.initial_budget,
18
+ incoming_jobs=0
19
+ )
20
+ return self.state
21
+
22
+ def step(self, action: Action) -> StepResult:
23
+ if self.state is None:
24
+ raise ValueError("Environment must be reset before calling step.")
25
+
26
+ # 1. Apply Action (Scale infrastructure)
27
+ self.state.active_gpus = max(0, self.state.active_gpus + action.gpus_to_provision)
28
+
29
+ # 2. Simulate incoming workloads
30
+ new_jobs = np.random.poisson(self.config.job_arrival_rate)
31
+ self.state.incoming_jobs = new_jobs
32
+ self.state.queue_size += new_jobs
33
+
34
+ # 3. Process jobs (1 GPU processes 1 job per step)
35
+ jobs_processed = min(self.state.active_gpus, self.state.queue_size)
36
+ self.state.queue_size -= jobs_processed
37
+
38
+ # 4. Calculate Costs & Rewards
39
+ gpu_cost = self.state.active_gpus * 2.5 # $2.50 per step per GPU
40
+ sla_penalty = self.state.queue_size * 1.0 # $1 penalty per waiting job
41
+
42
+ self.state.current_budget -= gpu_cost
43
+
44
+ # Reward shaping
45
+ reward = (jobs_processed * 5.0) - gpu_cost - sla_penalty
46
+ self.total_reward += reward
47
+ self.state.time_step += 1
48
+
49
+ # 5. Terminal Conditions
50
+ done = self.state.time_step >= self.config.max_steps or self.state.current_budget <= 0
51
+
52
+ return StepResult(
53
+ observation=self.state,
54
+ reward=reward,
55
+ done=done,
56
+ info={"jobs_processed": jobs_processed, "total_reward": self.total_reward}
57
+ )
src/environment.py ADDED
File without changes
src/main.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, HTTPException
2
+ from src.models import Action, TaskConfig
3
+ from src.env import GPUClusterEnv
4
+ from src.tasks import TASKS
5
+ import subprocess
6
+
7
+ app = FastAPI(title="GPU Cluster OpenEnv")
8
+ env = GPUClusterEnv()
9
+
10
+ @app.get("/")
11
+ def health_check():
12
+ return {"status": "ok", "message": "GPUClusterEnv is running"}
13
+
14
+ @app.post("/reset")
15
+ def reset_env(task_id: str = "easy"):
16
+ if task_id not in TASKS:
17
+ raise HTTPException(status_code=404, detail="Task not found")
18
+ obs = env.reset(TASKS[task_id])
19
+ return {"observation": obs.dict()}
20
+
21
+ @app.post("/step")
22
+ def step_env(action: Action):
23
+ try:
24
+ result = env.step(action)
25
+ return result.dict()
26
+ except Exception as e:
27
+ raise HTTPException(status_code=400, detail=str(e))
28
+
29
+ @app.get("/state")
30
+ def get_state():
31
+ if env.state is None:
32
+ raise HTTPException(status_code=400, detail="Environment not initialized")
33
+ return {"observation": env.state.dict()}
34
+
35
+ @app.get("/tasks")
36
+ def list_tasks():
37
+ return {
38
+ "tasks": list(TASKS.keys()),
39
+ "action_schema": Action.schema()
40
+ }
41
+
42
+ @app.get("/grader")
43
+ def grader():
44
+ # Normalizes total reward to a 0.0 - 1.0 score based on max possible baseline
45
+ if env.state is None:
46
+ return {"score": 0.0}
47
+ max_expected_reward = env.config.max_steps * 10 # Arbitrary max for example
48
+ score = max(0.0, min(1.0, env.total_reward / max_expected_reward))
49
+ return {"score": score}
50
+
51
+ @app.post("/baseline")
52
+ def run_baseline():
53
+ # Trigger the baseline script and return results
54
+ result = subprocess.run(["python", "src/baseline.py"], capture_output=True, text=True)
55
+ return {"output": result.stdout}
src/models.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic import BaseModel
2
+ from typing import List, Dict
3
+
4
+ class Observation(BaseModel):
5
+ time_step: int
6
+ active_gpus: int
7
+ queue_size: int
8
+ current_budget: float
9
+ incoming_jobs: int
10
+
11
+ class Action(BaseModel):
12
+ gpus_to_provision: int # Can be positive (spin up) or negative (spin down)
13
+
14
+ class StepResult(BaseModel):
15
+ observation: Observation
16
+ reward: float
17
+ done: bool
18
+ info: Dict
19
+
20
+ class TaskConfig(BaseModel):
21
+ task_id: str
22
+ difficulty: str
23
+ max_steps: int
24
+ initial_budget: float
25
+ job_arrival_rate: float # Lambda for Poisson distribution
src/tasks.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ from src.models import TaskConfig
2
+
3
+ TASKS = {
4
+ "easy": TaskConfig(task_id="easy", difficulty="easy", max_steps=50, initial_budget=1000.0, job_arrival_rate=2.0),
5
+ "medium": TaskConfig(task_id="medium", difficulty="medium", max_steps=100, initial_budget=800.0, job_arrival_rate=5.0),
6
+ "hard": TaskConfig(task_id="hard", difficulty="hard", max_steps=200, initial_budget=500.0, job_arrival_rate=12.0),
7
+ }