# PR Review Negotiation Environment — V2 Learnings This document summarizes the critical challenges identified during V1 development and the specific architectural improvements implemented in V2 to satisfy the Hackathon's OpenEnv Benchmark requirements. ## 🏁 Critical Learnings & Problems Identified ### 1. The "Surface vs. Root" Problem * **The Problem**: Many AI agents provide generic feedback like "This code has a bug" or "Fix the syntax." In V1, these agents would pass simply by identifying the location of the error. * **V2 Fix**: We implemented a **Root-Cause Grader**. Agents now get penalized (-0.05) if they only describe the symptom and are only rewarded (+0.25) if they identify the underlying logic flaw or security vector. ### 2. The "Fake Fix" Deception * **The Problem**: In iterative negotiation, a developer (author) might push a cosmetic "fix" (e.g., adding `.strip()` to a SQL injection) that looks correct but leaves the vulnerability active. * **V2 Fix**: Added specific `false_fix_keywords` detection. V2 agents are now tested on their ability to resist "social engineering" from the author and persist with the review until a genuine parameterized fix is provided. ### 3. Escalate vs. Request Changes * **The Problem**: Critical security failures (like hardcoded JWT secrets) are often treated as normal "Request Changes" items, wasting precious time. * **V2 Fix**: Implemented **Escalation Logic**. V2 requires the agent to correctly identify High-Severity breaches and transition the decision to `escalate`, rewarding immediate reporting over standard iterative feedback. ### 4. State Persistence & UI Synchronization * **The Problem**: Frontend state inconsistently mapped to the backend's "done" flag, leading to "nothing visible" errors when reviews concluded. * **V2 Fix**: Moved to a **Stateless Component Architecture** driven entirely by the `PRObservation` returned from the API, ensuring the UI 1:1 reflects the environment's internal truth. ## 🏆 Hackathon Compliance Improvements - **OpenEnv 0.2.1 Schema**: V2 uses strictly validated Pydantic models for every endpoint. - **Root-Level `inference.py`**: Automated benchmarks can now locate and execute the agent's reasoning loop directly. - **Docker SDK SDK for Hugging Face**: Containerized deployment on port 7860 with an Nginx reverse proxy for high availability.