Spaces:

iitian
/

open_env

Running

App Files Files Community

iitian commited on 14 days ago

Commit

b23936a

1 Parent(s): 30134ef

Prepare and optimize for Hugging Face Spaces deployment

Browse files

Files changed (6) hide show

.vscode/settings.json +1 -0
Dockerfile +3 -3
README.md +57 -47
inference.py +65 -70
openenv.yaml +2 -2
server/app.py +1 -1

.vscode/settings.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {}

Dockerfile CHANGED Viewed

@@ -14,8 +14,8 @@ COPY openenv.yaml .
 COPY README.md .
 COPY DOCUMENTATION.md .
-# Expose the API port
-EXPOSE 8000
 # Start server
-CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

 COPY README.md .
 COPY DOCUMENTATION.md .
+# Expose the API port (Hugging Face default)
+EXPOSE 7860
 # Start server
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,64 +1,74 @@
-# CloudSecurityAuditor OpenEnv
-A standardized AI agent environment for simulating real-world cloud security audits. Built using the **OpenEnv** specification, it allows agents to interact with a mock cloud infrastructure to identify and remediate vulnerabilities.
 ## 🌟 Key Features
-- **Typed Models**: Full Pydantic support for actions and observations.
-- **Three Task Tiers**: Includes Easy (Information Gathering), Medium (Remediation), and Hard (Forensic Analysis).
-- **Gymnasium-Compatible API**: Implements `step()`, `reset()`, and `state()` methods.
-- **Reward-Driven**: Scalar rewards from 0.0 to 1.0 based on task completion.
-## 🛠 Action Space
-The agent can perform the following actions via the `step()` method:
-- **`list`**: Lists resources of a specific type (`s3`, `ec2`).
-- **`describe`**: Fetches detailed configuration for a specific resource ID.
-- **`modify`**: Updates resource configurations (e.g., security groups).
-- **`logs`**: Retrieves logs for a specific resource or service.
-- **`submit`**: Submits the final answer for the evaluation tasks.
-## 📊 Observation Space
-Each step returns a `CloudObservation` containing:
-- `resources`: A list of discovered resource records.
-- `details`: Metadata for a specific resource.
-- `logs`: Relevant log entries.
-- `status`: Human-readable status message.
-- `info`: Additional environment metadata.
-## 📋 Tasks
-1. **Easy (S3 Public Audit)**: Identify all public S3 buckets in the 'prod' region.
-2. **Medium (EC2 Security Patch)**: Find an EC2 instance with RDP port open to the internet and close it.
-3. **Hard (IAM Log Forensic)**: Trace unauthorized actions in `auth-logs` to identify a rogue IP address.
-## 🚀 Setup & Installation
-### Local Installation
 ```bash
 pip install -r requirements.txt
-```
-### Running the Server
-```bash
 python -m server.app
-```
-The server will start on `http://localhost:8000`.
-### Running the Baseline Agent
-```bash
-python scripts/baseline_inference.py
-```
-## 🐳 Docker Deployment
-To build and run the containerized environment:
-```bash
-docker build -t cloud-security-auditor-env .
-docker run -p 8000:8000 cloud-security-auditor-env
 ```
-## 🤗 Hugging Face Spaces
-This environment is designed to be deployed as an **OpenEnv Space**.
-1. Create a new Space on Hugging Face.
-2. Select **Docker** as the SDK.
-3. Upload the repository contents (including `openenv.yaml` and `Dockerfile`).
-4. Set the `entrypoint` to match the `uvicorn` command in `openenv.yaml`.

+---
+title: Cloud Security Auditor
+emoji: 🛡️
+colorFrom: blue
+colorTo: indigo
+sdk: docker
+app_port: 7860
+pinned: false
+license: apache-2.0
+---
+# 🛡️ CloudSecurityAuditor OpenEnv
+**CloudSecurityAuditor** is a high-fidelity, standardized AI agent environment designed to simulate real-world cloud security audit scenarios. Built upon the **OpenEnv** specification, it provides a safe, reproducible sandbox where autonomous agents can practice identifying, analyzing, and remediating critical security vulnerabilities in a mock cloud infrastructure.
+This environment is specifically engineered for benchmarking LLM-based security agents, offering a structured API and deterministic evaluation metrics.
 ## 🌟 Key Features
+- **Standardized API**: Fully compliant with the `openenv-core` specification, featuring Gymnasium-style `step()`, `reset()`, and `state()` methods.
+- **Realistic Cloud Mocking**: Simulates S3 bucket configurations, EC2 security groups, and IAM audit logs with high precision.
+- **Multi-Tiered Evaluation**:
+    - **Easy (Audit)**: Focuses on information gathering and resource tagging.
+    - **Medium (Remediation)**: Requires active patching and configuration changes.
+    - **Hard (Forensics)**: Demands log analysis and pattern matching to identify rogue actors.
+- **Typed Observations**: Robust Pydantic-based action and observation models ensure reliable agent-environment interactions.
+- **Automated Grading**: Scalar reward functions (0.0 to 1.0) provide immediate, granular feedback on agent performance.
+## 🛠 Action & Observation Space
+### Actions
+- `list`: Inventory resources (`s3`, `ec2`).
+- `describe`: Deep-dive into resource metadata.
+- `modify`: Apply security patches and rule updates.
+- `logs`: Extract forensic evidence from authentication logs.
+- `submit`: Finalize the task with a structured answer.
+### Observations
+- `resources`: Comprehensive resource records.
+- `details`: Metadata for specific entities.
+- `logs`: Event-based log entries.
+- `status`: Execution status and helper messages.
+## 📊 Available Tasks
+| ID | Name | Objective | Difficulty |
+|:---|:---|:---|:---|
+| `easy` | **S3 Public Audit** | Identify public 'prod' buckets. | Auditing |
+| `medium` | **EC2 Security Patch** | Remediate open RDP ports (3389). | Remediation |
+| `hard` | **IAM Log Forensic** | Trace 'DeleteStorage' actions in logs. | Forensics |
+## 🚀 Quick Start (Hugging Face)
+If you are running this in a **Hugging Face Space**:
+1.  **Examine the API**: The environment is hosted as a FastAPI server. Use the `/ui` endpoint for a visual dashboard.
+2.  **Inference**: Run the `inference.py` script locally, pointing the `ENV_URL` to your Space's URL.
+3.  **Evaluate**: The system will emit standardized logs for automated leaderboard tracking.
+## 🐳 Local Deployment
 ```bash
+# Clone and Install
 pip install -r requirements.txt
+# Run Server
 python -m server.app
+# Run Baseline
+python inference.py
 ```
+---
+Built with ❤️ for the AI Security community.

inference.py CHANGED Viewed

@@ -21,11 +21,13 @@ from openai import OpenAI
 # ──────────────────────────────────────────────
 # Configuration from environment variables
 # ──────────────────────────────────────────────
-API_BASE_URL = os.environ.get("API_BASE_URL", "https://openrouter.ai/api/v1")
-MODEL_NAME = os.environ.get("MODEL_NAME", "openai/gpt-4o-mini")
-HF_TOKEN = os.environ.get("HF_TOKEN", "")
-ENV_URL = os.environ.get("ENV_URL", "http://localhost:8000")
 # Initialize OpenAI-compatible client
 client = OpenAI(
@@ -119,7 +121,13 @@ def ask_llm(system_prompt: str, conversation: list) -> dict:
     # Strip markdown code fences if present
     if raw.startswith("```"):
         lines = raw.split("\n")
-        raw = "\n".join(lines[1:-1]) if len(lines) > 2 else raw
     try:
         return json.loads(raw)
@@ -135,81 +143,91 @@ def ask_llm(system_prompt: str, conversation: list) -> dict:
 # ──────────────────────────────────────────────
 # Structured logging helpers
 # ──────────────────────────────────────────────
-def log_start(task_id: str, task_name: str):
-    """Emit [START] log."""
-    print(f"[START] task_id={task_id} task_name={task_name} timestamp={datetime.now(timezone.utc).isoformat()}")
     sys.stdout.flush()
-def log_step(task_id: str, step_num: int, action: dict, observation: dict, reward: float, done: bool):
-    """Emit [STEP] log."""
-    print(
-        f"[STEP] task_id={task_id} step={step_num} "
-        f"action={json.dumps(action)} "
-        f"observation={json.dumps(observation)} "
-        f"reward={reward} done={done} "
-        f"timestamp={datetime.now(timezone.utc).isoformat()}"
-    )
     sys.stdout.flush()
-def log_end(task_id: str, task_name: str, final_score: float, total_steps: int):
-    """Emit [END] log."""
-    print(
-        f"[END] task_id={task_id} task_name={task_name} "
-        f"score={final_score} steps={total_steps} "
-        f"timestamp={datetime.now(timezone.utc).isoformat()}"
-    )
     sys.stdout.flush()
 # ──────────────────────────────────────────────
 # Main task runner
 # ──────────────────────────────────────────────
-def run_task(task: dict) -> float:
-    """Run a single task using the LLM agent. Returns the final reward score."""
     task_id = task["id"]
     task_name = task["name"]
     system_prompt = task["system_prompt"]
-    log_start(task_id, task_name)
     # Reset environment
-    reset_data = env_reset(task_id)
-    obs = reset_data.get("observation", {})
-    info = obs.get("info", "")
     conversation = [
         {"role": "user", "content": f"Task started. Environment says: {info}\nDecide your first action."}
     ]
-    cumulative_reward = 0.0
     step_num = 0
     for step_num in range(1, MAX_STEPS_PER_TASK + 1):
         try:
             # Ask LLM for next action
             action = ask_llm(system_prompt, conversation)
         except Exception as e:
-            print(f"[ERROR] LLM call failed at step {step_num}: {e}", file=sys.stderr)
             break
         # Execute the action in the environment
         try:
             result = env_step(action)
         except Exception as e:
-            print(f"[ERROR] Environment step failed at step {step_num}: {e}", file=sys.stderr)
             break
-        obs = result.get("observation", {})
-        reward = result.get("reward", 0.0)
-        done = result.get("done", False)
-        cumulative_reward += reward
-        # Log the step
-        log_step(task_id, step_num, action, obs, reward, done)
         if done:
             break
         # Build observation summary for the LLM
@@ -231,44 +249,21 @@ def run_task(task: dict) -> float:
         conversation.append({"role": "assistant", "content": json.dumps(action)})
         conversation.append({"role": "user", "content": f"Observation from environment:\n{obs_text}\n\nDecide your next action."})
-    log_end(task_id, task_name, cumulative_reward, step_num)
-    return cumulative_reward
 # ──────────────────────────────────────────────
 # Entry point
 # ──────────────────────────────────────────────
 def main():
-    print("=" * 60)
-    print("CloudSecurityAuditor — OpenEnv Inference")
-    print(f"Model: {MODEL_NAME}")
-    print(f"API:   {API_BASE_URL}")
-    print(f"Env:   {ENV_URL}")
-    print("=" * 60)
-    sys.stdout.flush()
-    total_score = 0.0
-    results = []
     for task in TASKS:
         try:
-            score = run_task(task)
-            results.append({"task_id": task["id"], "task_name": task["name"], "score": score})
-            total_score += score
-        except Exception as e:
-            print(f"[ERROR] Task {task['id']} failed: {e}", file=sys.stderr)
-            results.append({"task_id": task["id"], "task_name": task["name"], "score": 0.0})
-    # Final summary
-    print("\n" + "=" * 60)
-    print("FINAL RESULTS")
-    print("=" * 60)
-    for r in results:
-        status = "✅ PASS" if r["score"] >= 1.0 else "❌ FAIL"
-        print(f"  {r['task_name']:25s} → score={r['score']:.2f}  {status}")
-    print(f"\n  Total Score: {total_score:.2f} / {len(TASKS)}.00")
-    print("=" * 60)
-    sys.stdout.flush()
 if __name__ == "__main__":

 # ──────────────────────────────────────────────
 # Configuration from environment variables
 # ──────────────────────────────────────────────
+API_BASE_URL = os.getenv("API_BASE_URL", "https://openrouter.ai/api/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "openai/gpt-4o-mini")
+HF_TOKEN = os.getenv("HF_TOKEN", "")
+LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME", "")
+ENV_URL = os.getenv("ENV_URL", "http://localhost:8000")
+BENCHMARK_NAME = "cloud-security-auditor"
 # Initialize OpenAI-compatible client
 client = OpenAI(
     # Strip markdown code fences if present
     if raw.startswith("```"):
         lines = raw.split("\n")
+        # Handle cases where the JSON block is the only content
+        if "{" in raw:
+            start = raw.find("{")
+            end = raw.rfind("}") + 1
+            raw = raw[start:end]
+        else:
+            raw = "\n".join(lines[1:-1]) if len(lines) > 2 else raw
     try:
         return json.loads(raw)
 # ──────────────────────────────────────────────
 # Structured logging helpers
 # ──────────────────────────────────────────────
+def log_start(task_name: str):
+    """
+    [START] task=<task_name> env=<benchmark> model=<model_name>
+    """
+    print(f"[START] task={task_name} env={BENCHMARK_NAME} model={MODEL_NAME}")
     sys.stdout.flush()
+def log_step(step_num: int, action: dict, reward: float, done: bool, error: str = None):
+    """
+    [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
+    """
+    error_str = "null" if not error else error
+    # Remove newlines from action for single-line requirement
+    action_str = json.dumps(action).replace("\n", " ")
+    done_str = "true" if done else "false"
+    print(f"[STEP]  step={step_num} action={action_str} reward={reward:.2f} done={done_str} error={error_str}")
     sys.stdout.flush()
+def log_end(success: bool, total_steps: int, score: float, rewards: list):
+    """
+    [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
+    """
+    success_str = "true" if success else "false"
+    rewards_str = ",".join([f"{r:.2f}" for r in rewards])
+    print(f"[END]   success={success_str} steps={total_steps} score={score:.2f} rewards={rewards_str}")
     sys.stdout.flush()
 # ──────────────────────────────────────────────
 # Main task runner
 # ──────────────────────────────────────────────
+def run_task(task: dict):
+    """Run a single task using the LLM agent."""
     task_id = task["id"]
     task_name = task["name"]
     system_prompt = task["system_prompt"]
+    log_start(task_id)
     # Reset environment
+    try:
+        reset_data = env_reset(task_id)
+        obs = reset_data.get("observation", {})
+        info = obs.get("info", "")
+    except Exception as e:
+        log_end(success=False, total_steps=0, score=0.0, rewards=[])
+        return
     conversation = [
         {"role": "user", "content": f"Task started. Environment says: {info}\nDecide your first action."}
     ]
+    rewards = []
     step_num = 0
+    success = False
+    last_error = None
     for step_num in range(1, MAX_STEPS_PER_TASK + 1):
         try:
             # Ask LLM for next action
             action = ask_llm(system_prompt, conversation)
         except Exception as e:
+            last_error = f"LLM error: {str(e)}"
+            log_step(step_num, {"error": "LLM failed"}, 0.0, True, error=last_error)
             break
         # Execute the action in the environment
         try:
             result = env_step(action)
+            obs = result.get("observation", {})
+            reward = result.get("reward", 0.0)
+            done = result.get("done", False)
+            last_error = obs.get("last_action_error")
         except Exception as e:
+            last_error = f"Env error: {str(e)}"
+            log_step(step_num, action, 0.0, True, error=last_error)
             break
+        rewards.append(reward)
+        log_step(step_num, action, reward, done, error=last_error)
         if done:
+            success = (reward >= 1.0)  # Assume 1.0 is full success
             break
         # Build observation summary for the LLM
         conversation.append({"role": "assistant", "content": json.dumps(action)})
         conversation.append({"role": "user", "content": f"Observation from environment:\n{obs_text}\n\nDecide your next action."})
+    # Calculate final score (normalized to [0, 1])
+    final_score = max(0.0, min(1.0, sum(rewards)))
+    log_end(success=success, total_steps=step_num, score=final_score, rewards=rewards)
 # ──────────────────────────────────────────────
 # Entry point
 # ──────────────────────────────────────────────
 def main():
     for task in TASKS:
         try:
+            run_task(task)
+        except Exception:
+            pass
 if __name__ == "__main__":

openenv.yaml CHANGED Viewed

@@ -5,8 +5,8 @@ hardware:
   tier: "cpu-small"
   vCPU: 2
   RAM: 4Gi
-port: 8000
-entrypoint: "uvicorn server.app:app --host 0.0.0.0 --port 8000"
 tags:
   - security
   - cloud

   tier: "cpu-small"
   vCPU: 2
   RAM: 4Gi
+port: 7860
+entrypoint: "uvicorn server.app:app --host 0.0.0.0 --port 7860"
 tags:
   - security
   - cloud

server/app.py CHANGED Viewed

@@ -39,4 +39,4 @@ async def get_state():
 if __name__ == "__main__":
     import uvicorn
-    uvicorn.run(app, host="0.0.0.0", port=8000)

 if __name__ == "__main__":
     import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=7860)