Spaces:

h1manshu
/

code_review

Sleeping

App Files Files Community

h1manshu commited on 18 days ago

Commit

09ec238

verified ·

1 Parent(s): 0f13ee5

Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

Dockerfile +81 -0
README.md +250 -5
__init__.py +17 -0
client.py +142 -0
dataset/dataset.json +129 -0
inference.py +271 -0
models.py +56 -0
openenv.yaml +7 -0
pyproject.toml +45 -0
server/__init__.py +11 -0
server/app.py +84 -0
server/code_review_environment.py +338 -0
server/requirements.txt +6 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,81 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=code_review
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -1,10 +1,255 @@
 ---
-title: Code Review
-emoji: 🌖
-colorFrom: gray
-colorTo: indigo
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Code Review Environment Server
+emoji: 🎳
+colorFrom: green
+colorTo: gray
 sdk: docker
 pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
 ---
+# Code Review Environment
+A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
+## Quick Start
+The simplest way to use the Code Review environment is through the `CodeReviewEnv` class:
+```python
+from code_review import CodeReviewAction, CodeReviewEnv
+try:
+    # Create environment from Docker image
+    code_reviewenv = CodeReviewEnv.from_docker_image("code_review-env:latest")
+    # Reset
+    result = code_reviewenv.reset()
+    print(f"Reset: {result.observation.echoed_message}")
+    # Send multiple messages
+    messages = ["Hello, World!", "Testing echo", "Final message"]
+    for msg in messages:
+        result = code_reviewenv.step(CodeReviewAction(message=msg))
+        print(f"Sent: '{msg}'")
+        print(f"  → Echoed: '{result.observation.echoed_message}'")
+        print(f"  → Length: {result.observation.message_length}")
+        print(f"  → Reward: {result.reward}")
+finally:
+    # Always clean up
+    code_reviewenv.close()
+```
+That's it! The `CodeReviewEnv.from_docker_image()` method handles:
+- Starting the Docker container
+- Waiting for the server to be ready
+- Connecting to the environment
+- Container cleanup when you call `close()`
+## Building the Docker Image
+Before using the environment, you need to build the Docker image:
+```bash
+# From project root
+docker build -t code_review-env:latest -f server/Dockerfile .
+```
+## Deploying to Hugging Face Spaces
+You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
+```bash
+# From the environment directory (where openenv.yaml is located)
+openenv push
+# Or specify options
+openenv push --namespace my-org --private
+```
+The `openenv push` command will:
+1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
+2. Prepare a custom build for Hugging Face Docker space (enables web interface)
+3. Upload to Hugging Face (ensuring you're logged in)
+### Prerequisites
+- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
+### Options
+- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
+- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
+- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
+- `--private`: Deploy the space as private (default: public)
+### Examples
+```bash
+# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
+openenv push
+# Push to a specific repository
+openenv push --repo-id my-org/my-env
+# Push with a custom base image
+openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
+# Push as a private space
+openenv push --private
+# Combine options
+openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
+```
+After deployment, your space will be available at:
+`https://huggingface.co/spaces/<repo-id>`
+The deployed space includes:
+- **Web Interface** at `/web` - Interactive UI for exploring the environment
+- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
+- **Health Check** at `/health` - Container health monitoring
+- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
+## Environment Details
+### Action
+**CodeReviewAction**: Contains a single field
+- `message` (str) - The message to echo back
+### Observation
+**CodeReviewObservation**: Contains the echo response and metadata
+- `echoed_message` (str) - The message echoed back
+- `message_length` (int) - Length of the message
+- `reward` (float) - Reward based on message length (length × 0.1)
+- `done` (bool) - Always False for echo environment
+- `metadata` (dict) - Additional info like step count
+### Reward
+The reward is calculated as: `message_length × 0.1`
+- "Hi" → reward: 0.2
+- "Hello, World!" → reward: 1.3
+- Empty message → reward: 0.0
+## Advanced Usage
+### Connecting to an Existing Server
+If you already have a Code Review environment server running, you can connect directly:
+```python
+from code_review import CodeReviewEnv
+# Connect to existing server
+code_reviewenv = CodeReviewEnv(base_url="<ENV_HTTP_URL_HERE>")
+# Use as normal
+result = code_reviewenv.reset()
+result = code_reviewenv.step(CodeReviewAction(message="Hello!"))
+```
+Note: When connecting to an existing server, `code_reviewenv.close()` will NOT stop the server.
+### Using the Context Manager
+The client supports context manager usage for automatic connection management:
+```python
+from code_review import CodeReviewAction, CodeReviewEnv
+# Connect with context manager (auto-connects and closes)
+with CodeReviewEnv(base_url="http://localhost:8000") as env:
+    result = env.reset()
+    print(f"Reset: {result.observation.echoed_message}")
+    # Multiple steps with low latency
+    for msg in ["Hello", "World", "!"]:
+        result = env.step(CodeReviewAction(message=msg))
+        print(f"Echoed: {result.observation.echoed_message}")
+```
+The client uses WebSocket connections for:
+- **Lower latency**: No HTTP connection overhead per request
+- **Persistent session**: Server maintains your environment state
+- **Efficient for episodes**: Better for many sequential steps
+### Concurrent WebSocket Sessions
+The server supports multiple concurrent WebSocket connections. To enable this,
+modify `server/app.py` to use factory mode:
+```python
+# In server/app.py - use factory mode for concurrent sessions
+app = create_app(
+    CodeReviewEnvironment,  # Pass class, not instance
+    CodeReviewAction,
+    CodeReviewObservation,
+    max_concurrent_envs=4,  # Allow 4 concurrent sessions
+)
+```
+Then multiple clients can connect simultaneously:
+```python
+from code_review import CodeReviewAction, CodeReviewEnv
+from concurrent.futures import ThreadPoolExecutor
+def run_episode(client_id: int):
+    with CodeReviewEnv(base_url="http://localhost:8000") as env:
+        result = env.reset()
+        for i in range(10):
+            result = env.step(CodeReviewAction(message=f"Client {client_id}, step {i}"))
+        return client_id, result.observation.message_length
+# Run 4 episodes concurrently
+with ThreadPoolExecutor(max_workers=4) as executor:
+    results = list(executor.map(run_episode, range(4)))
+```
+## Development & Testing
+### Direct Environment Testing
+Test the environment logic directly without starting the HTTP server:
+```bash
+# From the server directory
+python3 server/code_review_environment.py
+```
+This verifies that:
+- Environment resets correctly
+- Step executes actions properly
+- State tracking works
+- Rewards are calculated correctly
+### Running Locally
+Run the server locally for development:
+```bash
+uvicorn server.app:app --reload
+```
+## Project Structure
+```
+code_review/
+├── .dockerignore         # Docker build exclusions
+├── __init__.py            # Module exports
+├── README.md              # This file
+├── openenv.yaml           # OpenEnv manifest
+├── pyproject.toml         # Project metadata and dependencies
+├── uv.lock                # Locked dependencies (generated)
+├── client.py              # CodeReviewEnv client
+├── models.py              # Action and Observation models
+└── server/
+    ├── __init__.py        # Server module exports
+    ├── code_review_environment.py  # Core environment logic
+    ├── app.py             # FastAPI application (HTTP + WebSocket endpoints)
+    └── Dockerfile         # Container image definition
+```

__init__.py ADDED Viewed

	@@ -0,0 +1,17 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Code Review Environment."""
+from .client import CodeReviewEnv
+from .models import CodeReviewAction, CodeReviewObservation
+from .server.code_review_environment import CodeReviewEnvironment
+__all__ = [
+    "CodeReviewAction",
+    "CodeReviewObservation",
+    "CodeReviewEnvironment",
+]

client.py ADDED Viewed

	@@ -0,0 +1,142 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Code Review Environment Client."""
+from typing import Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+from .models import CodeReviewAction, CodeReviewObservation, CodeReviewReward , CodeReviewPullRequest
+class CodeReviewEnv(
+    EnvClient[CodeReviewAction, CodeReviewObservation, State]
+):
+    """
+    Client for the Code Review Environment.
+    This client maintains a persistent WebSocket connection to the environment server,
+    enabling efficient multi-step interactions with lower latency.
+    Each client instance has its own dedicated environment session on the server.
+    Example:
+        >>> # Connect to a running server
+        >>> with CodeReviewEnv(base_url="http://localhost:8000") as client:
+        ...     result = client.reset()
+        ...     print(result.observation.echoed_message)
+        ...
+        ...     result = client.step(CodeReviewAction(message="Hello!"))
+        ...     print(result.observation.echoed_message)
+    Example with Docker:
+        >>> # Automatically start container and connect
+        >>> client = CodeReviewEnv.from_docker_image("code_review-env:latest")
+        >>> try:
+        ...     result = client.reset()
+        ...     result = client.step(CodeReviewAction(message="Test"))
+        ... finally:
+        ...     client.close()
+    """
+    def _step_payload(self, action: CodeReviewAction) -> Dict:
+        # print("Action == ", action)
+        # Handle dict input
+        if isinstance(action, dict):
+            act = {
+                "action_type": action.get("action_type"),
+                "comment": action.get("comment"),
+                "suggested_code": action.get("suggested_code"),
+                "decision": action.get("decision"),
+            }
+        else:
+            act = {
+                "action_type": action.action_type,
+                "comment": action.comment,
+                "suggested_code": action.suggested_code,
+                "decision": action.decision,
+            }
+        # print("Act == ", act)
+        return act
+    def _parse_result(self, payload: Dict) -> StepResult[CodeReviewObservation]:
+        """
+        Parse server response into StepResult[CodeReviewObservation].
+        Args:
+            payload: JSON response data from server
+        Returns:
+            StepResult with CodeReviewObservation
+        """
+        """
+         return CodeReviewObservation(
+            #echoed_message="Code Review environment ready!",
+            pr=self.pr,
+            previous_comments=self.history,
+            step_count=self.step_count,
+            max_steps=self.max_steps,
+            reward=0.0,
+            done=False,
+        )
+        """
+        # print("Payload ====== ", payload)
+        obs_data = payload.get("observation") or {}
+        if "observation" in obs_data:  # nested case
+            obs_data = obs_data["observation"]
+        if not obs_data or "pr" not in obs_data:
+            raise ValueError(f"Invalid observation payload: {payload}")
+        pr_data = obs_data["pr"]
+        observation = CodeReviewObservation(
+            pr=CodeReviewPullRequest(**pr_data),
+            previous_comments=obs_data.get("previous_comments") or [],
+            step_count=obs_data.get("step_count", 0),
+            max_steps=obs_data.get("max_steps", 3),
+        )
+        # Handle reward (reset vs step)
+        reward_data = payload.get("reward")
+        reward = None
+        if isinstance(reward_data, dict):
+            reward = CodeReviewReward(**reward_data)
+        # else: float/None → ignore (reset case)
+        return StepResult(
+            observation=observation,
+            reward=reward,
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        """
+        Parse server response into State object.
+        Args:
+            payload: JSON response from state request
+        Returns:
+            State object with episode_id and step_count
+        """
+        return State(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+        )

dataset/dataset.json ADDED Viewed

	@@ -0,0 +1,129 @@

+[
+  {
+    "task_type": "easy",
+    "pr": {
+      "id": "2",
+      "title": "Missing import",
+      "description": "Forgot to import module",
+      "language": "python",
+      "diffs": [
+        {
+          "file_name": "main.py",
+          "diff": "print(datetime.now())"
+        }
+      ]
+    },
+    "ground_truth": {
+      "issues": ["missing import datetime"],
+      "decision": "reject",
+      "fix": "from datetime import datetime"
+    }
+  },
+  {
+    "task_type": "medium",
+    "pr": {
+      "id": "3",
+      "title": "Division function",
+      "description": "Handles division",
+      "language": "python",
+      "diffs": [
+        {
+          "file_name": "math.py",
+          "diff": "def divide(a,b): return a/b"
+        }
+      ]
+    },
+    "ground_truth": {
+      "issues": ["division by zero"],
+      "decision": "reject",
+      "fix": "if b == 0: return None"
+    }
+  },
+  {
+    "task_type": "medium",
+    "pr": {
+      "id": "4",
+      "title": "Inefficient loop",
+      "description": "Optimizing search",
+      "language": "python",
+      "diffs": [
+        {
+          "file_name": "search.py",
+          "diff": "for i in range(len(arr)):\n    if arr[i] == target:\n        return True"
+        }
+      ]
+    },
+    "ground_truth": {
+      "issues": ["inefficient loop"],
+      "decision": "approve",
+      "fix": "use 'if target in arr'"
+    }
+  },
+  {
+    "task_type": "hard",
+    "pr": {
+      "id": "6",
+      "title": "Authentication logic",
+      "description": "Adds login system",
+      "language": "python",
+      "diffs": [
+        {
+          "file_name": "auth.py",
+          "diff": "def login(password):\n    if password == 'admin123':\n        return True"
+        }
+      ]
+    },
+    "ground_truth": {
+      "issues": ["hardcoded password", "security vulnerability"],
+      "decision": "reject",
+      "fix": "use hashed password comparison"
+    }
+  },
+  {
+    "task_type": "hard",
+    "pr": {
+      "id": "7",
+      "title": "SQL query issue",
+      "description": "Fetch user data",
+      "language": "python",
+      "diffs": [
+        {
+          "file_name": "db.py",
+          "diff": "query = \"SELECT * FROM users WHERE id = \" + user_id"
+        }
+      ]
+    },
+    "ground_truth": {
+      "issues": ["sql injection"],
+      "decision": "reject",
+      "fix": "use parameterized queries"
+    }
+  },
+  {
+    "task_type": "hard",
+    "pr": {
+      "id": "8",
+      "title": "Cross-file null bug",
+      "description": "User fetch logic",
+      "language": "python",
+      "diffs": [
+        {
+          "file_name": "service.py",
+          "diff": "def get_user(id):\n    return db[id]"
+        },
+        {
+          "file_name": "controller.py",
+          "diff": "user = get_user(None)"
+        }
+      ]
+    },
+    "ground_truth": {
+      "issues": ["invalid input", "null handling"],
+      "decision": "reject",
+      "fix": "validate id before calling get_user"
+    }
+  }
+]

inference.py ADDED Viewed

	@@ -0,0 +1,271 @@

+"""
+Inference Script Example
+===================================
+MANDATORY
+- Before submitting, ensure the following variables are defined in your environment configuration:
+    API_BASE_URL   The API endpoint for the LLM.
+    MODEL_NAME     The model identifier to use for inference.
+    HF_TOKEN       Your Hugging Face / API key.
+- The inference script must be named `inference.py` and placed in the root directory of the project
+- Participants must use OpenAI Client for all LLM calls using above variables
+"""
+import os
+import re
+import base64
+import textwrap
+from io import BytesIO
+from typing import List, Optional, Dict, Any
+from openai import OpenAI
+import numpy as np
+import json
+import asyncio
+from code_review import CodeReviewAction, CodeReviewObservation
+from code_review.client import CodeReviewEnv
+API_BASE_URL = "https://router.huggingface.co/v1"
+API_KEY = os.getenv("HF_TOKEN")
+MODEL_NAME = os.getenv("MODEL_NAME")
+MAX_STEPS = 3
+TEMPERATURE = 0.2
+MAX_TOKENS = 512
+DEBUG = True
+ACTION_PREFIX_RE = re.compile(
+    r"^(action|next action)\s*[:\-]\s*",
+    re.IGNORECASE,
+)
+ACTION_PATTERN = re.compile(r"[A-Za-z_]+\s*\(.*\)", re.DOTALL)
+SYSTEM_PROMPT = textwrap.dedent(
+    """
+You are a senior software engineer reviewing a pull request.
+You MUST follow this workflow:
+Step 1:
+Identify all issues in the code.
+List them clearly in the comment.
+Step 2:
+Provide a suggested fix with corrected code.
+Step 3:
+Make a final decision:
+- reject if any bug, security risk, or incorrect logic exists
+- approve only if the code is safe and correct
+Rules:
+- Mention every issue explicitly
+- Use precise technical language
+- Write detailed comments (>30 characters)
+Return ONLY JSON:
+{
+  "action_type": "comment | suggest_fix | final_decision",
+  "comment": "...",
+  "suggested_code": "...",
+  "decision": "approve | reject | null"
+}
+    """
+).strip()
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(
+    step: int, action: str, reward: float, done: bool, error: Optional[str]
+) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+def build_history_lines(history: List[str]) -> str:
+    if not history:
+        return "None"
+    return "\n".join(history[-4:])
+def safe_completion(client, messages):
+    for _ in range(3):
+        try:
+            return client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=messages,
+                temperature=TEMPERATURE,
+                max_tokens=MAX_TOKENS,
+            )
+        except Exception as e:
+            print("Error during completion, retrying...")
+            print(e)
+            continue
+    return None
+def build_prompt(step: int, max_steps: int, observation) -> str:
+    if step == 1:
+        instruction = (
+            "Carefully analyze the diff. List EVERY issue you find in the comment field. "
+            "Use exact technical terms (e.g. 'sql injection', 'null handling', 'hardcoded password'). "
+            "Set action_type to 'comment'."
+            "If the code looks correct with no issues, still output a comment like: 'No issues found. Code is clean.' and prepare to approve."
+        )
+    elif step == 2:
+        instruction = (
+            "Now provide the fix. Set action_type to 'suggest_fix'. "
+            "Write the corrected code in suggested_code. "
+            "Also repeat the issues in the comment field."
+        )
+    else:
+        instruction = (
+            "Make your final decision. Set action_type to 'final_decision'. "
+            "Set decision to 'reject' if any bug, security issue, or bad logic exists. "
+            "Set decision to 'approve' only if the code is clean and correct."
+        )
+    diff_text = "\n\n".join(
+        f"File: {d.file_name}\n{d.diff}" for d in observation.pr.diffs
+    )
+    return textwrap.dedent(
+        f"""
+        Step {step}/{max_steps}
+        Title: {observation.pr.title}
+        Description: {observation.pr.description}
+        Code Diffs:
+        {diff_text}
+        Previous Comments:
+        {build_history_lines(observation.previous_comments)}
+        Your task: {instruction}
+        Return ONLY valid JSON:
+        {{
+          "action_type": "...",
+          "comment": "...",
+          "suggested_code": "...",
+          "decision": "approve | reject | null"
+        }}
+    """
+    ).strip()
+def fallback_action():
+    return {
+        "action_type": "comment",
+        "comment": "fallback: invalid response",
+        "suggested_code": None,
+        "decision": None,
+    }
+def parse_action(text: str) -> Dict[str, Any]:
+    if not text:
+        return fallback_action()
+    text = text.strip().replace("```json", "").replace("```", "")
+    try:
+        return json.loads(text)
+    except Exception as e:
+        print(e)
+        return fallback_action()
+async def run_episode(client, env):
+    result = await env.reset()
+    obs = result.observation
+    final_score = 0.0
+    for step in range(1, MAX_STEPS + 1):
+        prompt = build_prompt(step, MAX_STEPS, obs)
+        messages = [
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user", "content": prompt},
+        ]
+        completion = safe_completion(client, messages)  # still sync
+        # print(completion)
+        if completion is None:
+            action = fallback_action()
+        else:
+            response_text = completion.choices[0].message.content or ""
+            action_dict = parse_action(response_text)
+            # print(response_text)
+            action = CodeReviewAction(
+                action_type=action_dict.get("action_type"),
+                comment=action_dict.get("comment"),
+                suggested_code=action_dict.get("suggested_code"),
+                decision=action_dict.get("decision"),
+            )
+        result = await env.step(action)
+        # print("Result === " , result)
+        obs = result.observation
+        reward = result.reward
+        done = result.done
+        final_score = max(final_score, reward.score if reward else 0.0)
+        print(f"Step {step} | Action: {action} | Reward: {reward}")
+        if done:
+            print(f"Done in {step} steps")
+            break
+    return final_score
+async def main():
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    scores = []
+    # log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
+    async with CodeReviewEnv(base_url="http://localhost:8000") as env:
+        NUM_EPISODES = 6
+        for i in range(NUM_EPISODES):
+            print(f"\n=== Episode {i+1} ===")
+            env.task_index = i
+            score = await run_episode(client, env)
+            scores.append(score)
+            print(f"Scores so far: {scores}")
+            # return 0
+    print("\nFinished all episodes")
+    print(f"Final Scores: {scores}")
+if __name__ == "__main__":
+    asyncio.run(main())

models.py ADDED Viewed

	@@ -0,0 +1,56 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the Code Review Environment.
+The code_review environment is a simple test environment that echoes back messages.
+"""
+from openenv.core.env_server.types import Action, Observation
+from pydantic import Field, BaseModel
+from typing import Optional, List ,  Any , Dict
+class CodeReviewAction(Action):
+    """Action for the Code Review environment - just a message to echo."""
+    # message: str = Field(..., description="Message to echo back")
+    action_type: str  # comment / suggest_fix / final_decision
+    comment: Optional[str] = None
+    suggested_code: Optional[str] = None
+    decision: Optional[str] = None
+class CodeDiff(BaseModel):
+    file_name: str
+    diff: str
+class CodeReviewPullRequest(BaseModel):
+    id: str
+    title: str
+    description: str
+    diffs: List[CodeDiff]
+    language: str
+class CodeReviewObservation(Observation):
+    """Observation from the Code Review environment - the echoed message."""
+    #echoed_message: str = Field(default="", description="The echoed message")
+    pr: CodeReviewPullRequest
+    previous_comments: List[str]
+    step_count: int
+    max_steps: int
+class CodeReviewReward(BaseModel):
+    score: float
+    feedback: str
+class CodeReviewStepResponse(BaseModel):
+    observation: CodeReviewObservation
+    reward: CodeReviewReward
+    done: bool
+    info: Dict[str, Any] = {}

openenv.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+spec_version: 1
+name: code_review
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

pyproject.toml ADDED Viewed

	@@ -0,0 +1,45 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-code_review"
+version = "0.1.0"
+description = "Code Review environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.1",
+    # Environment-specific dependencies
+    # Add all dependencies needed for your environment here
+    # Examples:
+    # "numpy>=1.19.0",
+    # "torch>=2.0.0",
+    # "gymnasium>=0.29.0",
+    # "openspiel>=1.0.0",
+    # "smolagents>=1.22.0,<2",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m code_review.server.app
+server = "code_review.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["code_review", "code_review.server"]
+package-dir = { "code_review" = ".", "code_review.server" = "server" }

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Code Review environment server components."""
+from .code_review_environment import CodeReviewEnvironment
+__all__ = ["CodeReviewEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,84 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the Code Review Environment.
+This module creates an HTTP server that exposes the CodeReviewEnvironment
+over HTTP and WebSocket endpoints, compatible with EnvClient.
+Endpoints:
+    - POST /reset: Reset the environment
+    - POST /step: Execute an action
+    - GET /state: Get current environment state
+    - GET /schema: Get action/observation schemas
+    - WS /ws: WebSocket endpoint for persistent sessions
+Usage:
+    # Development (with auto-reload):
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
+    # Or run directly:
+    python -m server.app
+"""
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:  # pragma: no cover
+    raise ImportError(
+        "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
+    ) from e
+try:
+    from ..models import CodeReviewAction, CodeReviewObservation
+    from .code_review_environment import CodeReviewEnvironment
+except ModuleNotFoundError:
+    from models import CodeReviewAction, CodeReviewObservation
+    from server.code_review_environment import CodeReviewEnvironment
+# Create the app with web interface and README integration
+app = create_app(
+    CodeReviewEnvironment,
+    CodeReviewAction,
+    CodeReviewObservation,
+    env_name="code_review",
+    max_concurrent_envs=1,  # increase this number to allow more concurrent WebSocket sessions
+)
+def main(host: str = "0.0.0.0", port: int = 8000):
+    """
+    Entry point for direct execution via uv run or python -m.
+    This function enables running the server without Docker:
+        uv run --project . server
+        uv run --project . server --port 8001
+        python -m code_review.server.app
+    Args:
+        host: Host address to bind to (default: "0.0.0.0")
+        port: Port number to listen on (default: 8000)
+    For production deployments, consider using uvicorn directly with
+    multiple workers:
+        uvicorn code_review.server.app:app --workers 4
+    """
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int, default=8000)
+    args = parser.parse_args()
+    main(port=args.port)

server/code_review_environment.py ADDED Viewed

	@@ -0,0 +1,338 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Code Review Environment Implementation.
+A simple test environment that echoes back messages sent to it.
+Perfect for testing HTTP server infrastructure.
+"""
+from uuid import uuid4
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+try:
+    from ..models import (
+        CodeReviewAction,
+        CodeReviewObservation,
+        CodeReviewReward,
+        CodeReviewPullRequest,
+        CodeReviewStepResponse
+    )
+except ImportError:
+    from models import (
+        CodeReviewAction,
+        CodeReviewObservation,
+        CodeReviewReward,
+        CodeReviewPullRequest,
+    )
+import json
+from pathlib import Path
+dataset_path = Path(__file__).parent.parent / "dataset" / "dataset.json"
+class CodeReviewEnvironment(Environment):
+    """
+    A simple echo environment that echoes back messages.
+    This environment is designed for testing the HTTP server infrastructure.
+    It maintains minimal state and simply echoes back whatever message it receives.
+    Example:
+        >>> env = CodeReviewEnvironment()
+        >>> obs = env.reset()
+        >>> print(obs.echoed_message)  # "Code Review environment ready!"
+        >>>
+        >>> obs = env.step(CodeReviewAction(message="Hello"))
+        >>> print(obs.echoed_message)  # "Hello"
+        >>> print(obs.message_length)  # 5
+    """
+    # Enable concurrent WebSocket sessions.
+    # Set to True if your environment isolates state between instances.
+    # When True, multiple WebSocket clients can connect simultaneously, each
+    # getting their own environment instance (when using factory mode in app.py).
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(self):
+        """Initialize the code_review environment."""
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._reset_count = 0
+        self.max_steps = 5
+        self.task_index = 0
+        with open(dataset_path) as f:
+            self.dataset = json.load(f)
+        self.reset()
+    def reset(self) -> CodeReviewObservation:
+        """
+        Reset the environment.
+        Returns:
+            CodeReviewObservation with a ready message
+        """
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._reset_count += 1
+        self.task_index += 1
+        self.sample = self.dataset[self.task_index % len(self.dataset)]
+        self.pr = CodeReviewPullRequest(**self.sample["pr"])
+        self.gt = self.sample["ground_truth"]
+        self.task_type = self.sample.get("task_type", "unknown")
+        self.history = []
+        self.step_count = 0
+        self.done = False
+        # State evolution variables
+        self.issues_identified = []
+        self.fix_attempted = False
+        return CodeReviewObservation(
+            #echoed_message="Code Review environment ready!",
+            pr=self.pr,
+            previous_comments=self.history,
+            step_count=self.step_count,
+            max_steps=self.max_steps,
+            reward=0.0,
+            done=False,
+        )
+    def step(self, action: CodeReviewAction) -> CodeReviewObservation:  # type: ignore[override]
+        """
+        Execute a step in the environment by echoing the message.
+        Args:
+            action: CodeReviewAction containing the message to echo
+        Returns:
+            CodeReviewObservation with the echoed message and its length
+        """
+        self._state.step_count += 1
+        # print("RAW ACTION TYPE:", type(action))
+        # print("RAW ACTION:", action)
+        try:
+            if isinstance(action, dict):
+                action = CodeReviewAction(**action)
+            elif isinstance(action, (list, tuple)):
+                action = CodeReviewAction(
+                    action_type=action[0],
+                    comment=action[1] if len(action) > 1 else None,
+                    suggested_code=action[2] if len(action) > 2 else None,
+                    decision=action[3] if len(action) > 3 else None,
+                )
+            elif isinstance(action, CodeReviewAction):
+                pass
+            else:
+                raise ValueError(f"Unsupported action type: {type(action)}")
+        except Exception as e:
+            print(f"Error occurred while processing action: {e}")
+            return self._invalid_step()
+        self.step_count += 1
+        self.history.append(action)
+        if action.action_type == "comment" and action.comment:
+            self.issues_identified.append(action.comment)
+        if action.action_type == "suggest_fix":
+            self.fix_attempted = True
+        score = self.grade_action(action, self.gt)
+        print(f"Step {self.step_count} - Score: {score:.4f}")
+        bonus = 0.0
+        # Encourage meaningful comments
+        if action.comment and len(action.comment) > 30:
+            bonus += 0.1
+        # Encourage early correct decisions
+        if action.action_type == "final_decision" and self.step_count <= 2:
+            bonus += 0.1
+        # Penalize useless steps
+        if not action.comment and action.action_type != "final_decision":
+            bonus -= 0.1
+        # Penalize long trajectories
+        if self.step_count > 3:
+            bonus -= 0.05
+        score += bonus
+        score = max(0.0, min(score, 1.0))
+        # print("Final Score == " , score)
+        done = (
+            action.action_type == "final_decision" or self.step_count >= self.max_steps
+        )
+        if done:
+            score = max([self.grade_action(a, self.gt) for a in self.history] or [0.0])
+        # print(type(CodeReviewObservation))
+        # print(type(CodeReviewReward))
+        obs =  CodeReviewObservation(
+                pr=self.pr,
+                previous_comments=[a.comment for a in self.history if a.comment],
+                step_count=self.step_count,
+                max_steps=self.max_steps,
+            )
+        # print("Obs == " , obs)
+        rew =  CodeReviewReward(
+                score=score,
+                feedback="graded"
+            )
+        # print("FINAL REWARD TYPE:", type(rew))
+        # print("FINAL REWARD:", rew)
+        # print("Got the culprit I guess....")
+        return CodeReviewStepResponse(
+            observation=obs,
+            reward=rew,
+            done=done,
+            info={
+                "task_type": self.task_type,
+                "issues_identified": len(self.issues_identified),
+                "fix_attempted": self.fix_attempted,
+            },
+        )
+    @property
+    def state(self) -> State:
+        """
+        Get the current environment state.
+        Returns:
+            Current State with episode_id and step_count
+        """
+        return self._state
+    def _invalid_step(self):
+        rew =  CodeReviewReward(score=0.0, feedback="invalid action")
+        obs =  CodeReviewObservation(
+                echoed_message="Invalid action format. Please send a valid CodeReviewAction.",
+                pr=self.pr,
+                previous_comments=[a.comment for a in self.history if a.comment],
+                step_count=self.step_count,
+                max_steps=self.max_steps,
+            )
+        return CodeReviewStepResponse(
+            observation=obs,
+            reward=rew,
+            done=True,
+            info={"error": "invalid_action"},
+        )
+    def grade_action(self, action, ground_truth):
+        score = 0.0
+        print("Action === ", action)
+        print("Ground truth === ", ground_truth)
+        # ------------------------------
+        # ISSUE DETECTION (40%)
+        # ------------------------------
+        issue_score = self.score_issues(action.comment, ground_truth)
+        score += 0.4 * issue_score
+        print("After Issue Score == ", issue_score)
+        # ------------------------------
+        # FIX QUALITY (30%)
+        # ------------------------------
+        fix_score = self.score_fix(action.suggested_code, ground_truth)
+        score += 0.3 * fix_score
+        print("After Fix Score == ", fix_score)
+        # ------------------------------
+        # DECISION (30%)
+        # ------------------------------
+        decision_score = self.score_decision(action, ground_truth)
+        score += 0.3 * decision_score
+        print("After Decision Score == ", decision_score)
+        # ------------------------------
+        # CLAMP SCORE
+        # ------------------------------
+        score = max(0.0, min(score, 1.0))
+        return score
+    def normalize(self, text):
+        return (text or "").lower().strip()
+    # ==============================
+    # ISSUE MATCH (PARTIAL CREDIT)
+    # ==============================
+    def score_issues(self, comment, ground_truth):
+        issues = ground_truth.get("issues", [])
+        if not comment or not issues:
+            return 0.0
+        comment = self.normalize(comment)
+        matches = sum(1 for issue in issues if self.normalize(issue) in comment)
+        return matches / len(issues)
+    # ==============================
+    # FIX MATCH (FUZZY)
+    # ==============================
+    def score_fix(self, suggested_code, ground_truth):
+        if not suggested_code:
+            return 0.0
+        expected_fix = self.normalize(ground_truth.get("fix", ""))
+        suggested_code = self.normalize(suggested_code)
+        # direct match
+        if expected_fix in suggested_code:
+            return 1.0
+        # partial keyword match
+        keywords = expected_fix.split()
+        if not keywords:
+            return 0.0
+        matches = sum(1 for word in keywords if word in suggested_code)
+        return matches / len(keywords)
+    # ==============================
+    # DECISION MATCH
+    # ==============================
+    def score_decision(self, action, ground_truth):
+        expected = ground_truth.get("decision")
+        # Not a decision step → no contribution
+        if action.action_type != "final_decision":
+            return 0.0
+        # Missing decision → small penalty
+        if not action.decision:
+            return 0.0
+        # Correct decision
+        if action.decision == expected:
+            return 1.0
+        # Wrong decision → partial penalty (not negative)
+        return 0.2

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+openenv[core]>=0.2.0
+fastapi>=0.115.0
+uvicorn>=0.24.0

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff