Spaces:

ayushKishor
/

PR_Review_Negotiation_Environment

Running

App Files Files Community

PR_Review_Negotiation_Environment / LEARNINGS.md

3v324v23

V2: Full OpenEnv-Compliant Restructuring for Hackathon Submission

7d08a88 12 days ago

preview code

raw

history blame contribute delete

2.39 kB

PR Review Negotiation Environment — V2 Learnings

This document summarizes the critical challenges identified during V1 development and the specific architectural improvements implemented in V2 to satisfy the Hackathon's OpenEnv Benchmark requirements.

🏁 Critical Learnings & Problems Identified

1. The "Surface vs. Root" Problem

The Problem: Many AI agents provide generic feedback like "This code has a bug" or "Fix the syntax." In V1, these agents would pass simply by identifying the location of the error.
V2 Fix: We implemented a Root-Cause Grader. Agents now get penalized (-0.05) if they only describe the symptom and are only rewarded (+0.25) if they identify the underlying logic flaw or security vector.

2. The "Fake Fix" Deception

The Problem: In iterative negotiation, a developer (author) might push a cosmetic "fix" (e.g., adding .strip() to a SQL injection) that looks correct but leaves the vulnerability active.
V2 Fix: Added specific false_fix_keywords detection. V2 agents are now tested on their ability to resist "social engineering" from the author and persist with the review until a genuine parameterized fix is provided.

3. Escalate vs. Request Changes

The Problem: Critical security failures (like hardcoded JWT secrets) are often treated as normal "Request Changes" items, wasting precious time.
V2 Fix: Implemented Escalation Logic. V2 requires the agent to correctly identify High-Severity breaches and transition the decision to escalate, rewarding immediate reporting over standard iterative feedback.

4. State Persistence & UI Synchronization

The Problem: Frontend state inconsistently mapped to the backend's "done" flag, leading to "nothing visible" errors when reviews concluded.
V2 Fix: Moved to a Stateless Component Architecture driven entirely by the PRObservation returned from the API, ensuring the UI 1:1 reflects the environment's internal truth.

🏆 Hackathon Compliance Improvements

OpenEnv 0.2.1 Schema: V2 uses strictly validated Pydantic models for every endpoint.
Root-Level inference.py: Automated benchmarks can now locate and execute the agent's reasoning loop directly.
Docker SDK SDK for Hugging Face: Containerized deployment on port 7860 with an Nginx reverse proxy for high availability.