Chirag0123 commited on
Commit
d505d26
·
1 Parent(s): d6551d3

docs: highly refine README formatting with technical badges and polished narrative structure

Browse files
Files changed (1) hide show
  1. README.md +63 -39
README.md CHANGED
@@ -13,65 +13,83 @@ tags:
13
  - coding-agent
14
  ---
15
 
16
- # 🔍 The Antidote to "Vibe Coding" — AI Reliability & Navigation Platform
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- > **The system repairing the era of blind AI coding by making agents structural, testable, and deeply debuggable.**
19
-
20
- **Play with the live environment:** [Interactive Hugging Face Space](https://huggingface.co/spaces/Chirag0123/codebase-nav-env)
21
 
22
  ## 🚨 The End of "Vibe Coding"
23
 
24
- We are officially in the era of **Vibe Coding**. The amount of AI-generated code is exploding, yet developers and AI Agents (Copilot, Devin, Claude Code) are increasingly writing and submitting code *blindly*.
25
-
26
- Most agents don't actually know **where the issue exists**, what the **code flow** looks like, or how the **function dependencies** cascade. They simply guess edits based on the prompt until a test arbitrarily passes. When an AI agent claims "I fixed the bug," how do you verify *how* it did it? Did it actually navigate to the source of the crash, or did it randomly change syntax until the test turned green?
27
 
28
- Current benchmarks only evaluate the final outcome. **They don't evaluate cognition.**
29
 
30
- ## 💡 The Solution: 3D Visualization & Deep Analytic Execution
31
 
32
- This project is not just an environment benchmark—it is a **diagnostic platform**. It forces autonomous AI agents to explore an unknown Python repository file-by-file, and then exposes their **exact cognitive flow** using our bespoke state-of-the-art **3D Visualizer**.
33
 
34
- By tracking structural behavior instead of just outcomes, our platform gives researchers and operators complete visibility into the AI's actual thought processes and navigation flow.
35
 
36
- ### 🎬 See It In Action (Demo)
37
 
38
- ![3D Visualizer Architecture Trace](https://raw.githubusercontent.com/Chirag0096/Codebase-Navigation-Repair-OpenEnv/assets/assets/demo.webp)
39
 
40
- *(A live recording of the 3D agent visualizer tracking test files, source files, and resolving dependencies)*
41
 
42
- ## 🧠 Core Intelligence Modules
43
 
44
- Unlike existing models, we evaluate **how** the agent works, using several proprietary research-grade engines:
 
 
 
 
 
 
45
 
46
- | Module | What It Does (The Antidote to Vibe Coding) |
47
- |--------|--------------------------------------------|
48
- | **Causal Graph Probe** | Detects "Shortcut Learning". Did the agent actually read the test file, trace its imported module, and fix the root cause, or did it guess blindly? |
49
- | **Confidence Calibrator** | Infers agent behavioral confidence based on commit speed, rewrite hesitation, and test verification ratios. |
50
- | **Counterfactual Engine** | Analyzes the precise trace line to determine if the agent's strategy is brittle and heavily reliant on memorization of specific repository layouts. |
51
- | **Episodic Memory Bank** | A cross-episode RAG store that captures mistakes (like failing to run tests before commiting) and injects hard lessons into future iteration system prompts. |
52
- | **3D Trace Visualizer** | A seamless, fully-interpolated 3D environment engine that renders repos as geometric maps (Cubes for Source, Prisms for Tests) and visualizes the exact agent navigation traces with glowing Catmull-Rom tube curves. |
53
 
54
  ## ⚙️ How It Works (The OpenEnv Standard)
55
 
56
- 1. **Agent loads unfamiliar environment** sees repo file tree (NOT contents).
57
- 2. Agent reads files one at a time (costs strict exploration steps).
58
- 3. Agent identifies structural vulnerabilities via function-flow analysis.
59
- 4. Agent writes fixed code.
60
- 5. Agent verifies functionality through containerized `pytest` execution.
61
- 6. Environment scores agent completely dynamically across 6 separate research axes.
 
 
62
 
63
  ## 🚀 Quick Start
64
 
65
  ### 1. Run Locally (No Docker)
 
66
  ```bash
67
  pip install -r requirements.txt
68
- python app.py # Gradio UI + FastAPI at http://localhost:7860
69
  ```
70
 
71
  ### 2. Connect Your Custom LLM Agent
 
72
  ```bash
73
  export HF_TOKEN=hf_xxxxx
74
- # Configure your script to hit the local /step FASTApi environment
75
  python inference.py
76
  ```
77
 
@@ -81,14 +99,20 @@ docker build -t codebase-nav-env .
81
  docker run -p 7860:7860 codebase-nav-env
82
  ```
83
 
 
 
84
  ## 📊 Evaluation API Layers
85
 
86
- | Endpoint | Method | Description |
87
- |----------|--------|-------------|
88
- | `/step` | POST | Takes singular OpenEnv navigation action (`read_file`, `write_file`) |
89
- | `/evaluate` | GET | Baseline evaluation metrics |
90
- | `/causal-probe` | GET | Builds directed acyclic graphs resolving true root-cause logic mapping |
91
- | `/confidence` | GET | Returns behavior-time confidence estimation algorithms |
92
- | `/counterfactual` | POST | Subjects agent to 6 robustness ablation tests to detect hallucination |
 
 
 
 
93
 
94
- *Stop trusting the vibe. Force the cognition.*
 
13
  - coding-agent
14
  ---
15
 
16
+ <div align="center">
17
+ <a href="https://huggingface.co/spaces/Chirag0123/codebase-nav-env">
18
+ <img src="https://raw.githubusercontent.com/Chirag0096/Codebase-Navigation-Repair-OpenEnv/assets/assets/demo.webp" width="100%" alt="3D Visualizer Architecture Trace">
19
+ </a>
20
+
21
+ <br/>
22
+
23
+ <h1>🔍 Codebase Navigation Repair OpenEnv</h1>
24
+
25
+ <p><strong>The ultimate diagnostic environment to end "Vibe Coding." Making AI coding agents structural, testable, and deeply debuggable.</strong></p>
26
+
27
+ <p>
28
+ <a href="https://huggingface.co/spaces/Chirag0123/codebase-nav-env"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live%20Demo-blue" alt="Hugging Face Space"></a>
29
+ <img src="https://img.shields.io/badge/Python-3.10+-blue.svg" alt="Python Version">
30
+ <img src="https://img.shields.io/badge/FastAPI-REST_API-009688.svg" alt="FastAPI">
31
+ <img src="https://img.shields.io/badge/Three.js-3D_Visualizer-black.svg" alt="ThreeJs">
32
+ <img src="https://img.shields.io/badge/Docker-Containerized_Scoring-2496ED.svg" alt="Docker">
33
+ </p>
34
+ </div>
35
 
36
+ ---
 
 
37
 
38
  ## 🚨 The End of "Vibe Coding"
39
 
40
+ We are officially in the era of **Vibe Coding**. The volume of AI-generated code is exploding, yet developers and top-tier AI Agents (Copilot, Devin, Claude Code) are increasingly writing and submitting code *blindly*.
 
 
41
 
42
+ Most agents don't actually know **where the issue exists**, what the **code flow** looks like, or how the **function dependencies** cascade. Current developer benchmarks only evaluate the final outcome. **They do not evaluate cognition.**
43
 
44
+ When an AI agent claims "I fixed the bug," how do you verify *how* it did it? Did it actually navigate to the source of the crash, trace the logical data flow, or did it just randomly change syntax until a test arbitrarily turned green?
45
 
46
+ ## 💡 Our Solution: 3D Visualization & Deep Analytic Execution
47
 
48
+ This project is not just another benchmark—it is a **Full-Stack Diagnostic Platform**. It actively forces autonomous AI agents to explore an unknown Python repository file-by-file through a strictly monitored API, and then exposes their **exact cognitive layout**.
49
 
50
+ By tracking structural behavior instead of just binary pass/fail outcomes, our platform gives researchers, engineers, and Hackathon judges unprecedented visibility into an AI's actual thought process and navigation footprint.
51
 
52
+ ---
53
 
54
+ ## 🧠 Core Intelligence Modules (v4.0)
55
 
56
+ Unlike standard environments, we evaluate **how** the agent works using proprietary, research-grade engines built specifically for this platform:
57
 
58
+ | 🧩 Module | 🎯 What It Does (The Cure to Vibe Coding) |
59
+ |-----------|--------------------------------------------|
60
+ | **`3D Trace Visualizer`** | A seamless, fully-interpolated 3D engine that renders repos as geometric maps (Cubes for Source, Prisms for Tests). Visualizes agent navigation traces via glowing Catmull-Rom tube paths. |
61
+ | **`Causal Graph Probe`** | Detects "Shortcut Learning". Maps a Directed Acyclic Graph to verify if the agent actually read the test file, traced its imported module, and structurally fixed the root cause—or if it guessed blindly. |
62
+ | **`Confidence Calibrator`** | Infers the agent's behavioral confidence entirely based on real-time execution speeds, rewrite hesitation frequencies, and test verification ratios. |
63
+ | **`Counterfactual Engine`** | Subjects the agent to 6 robustness ablation tests (mutating the environment behind the scenes) to determine if its strategy relies on brittle memorization. |
64
+ | **`Episodic Memory Bank`** | A cross-episode Retrieval-Augmented Generation (RAG) store capturing procedural mistakes (e.g., failing to run tests before committing) to dynamically auto-inject hard lessons into future iteration system prompts. |
65
 
66
+ ---
 
 
 
 
 
 
67
 
68
  ## ⚙️ How It Works (The OpenEnv Standard)
69
 
70
+ 1. **Blind Start:** Agent loads an unfamiliar environment variant -> sees the repository file tree (NOT contents).
71
+ 2. **Step Budgeting:** Agent explores variables and reads files one at a time (costing strictly penalized exploration steps).
72
+ 3. **Flow Navigation:** Agent navigates architecture dependencies and identifies structural vulnerabilities.
73
+ 4. **Execution:** Agent acts and writes the updated architectural fix.
74
+ 5. **Verification:** Agent verifies functionality through containerized `pytest` execution loops safely within the RL boundary.
75
+ 6. **Dynamic Scoring:** Environment scores the agent's complete step trajectory across 6 independent research axes.
76
+
77
+ ---
78
 
79
  ## 🚀 Quick Start
80
 
81
  ### 1. Run Locally (No Docker)
82
+ Spin up the backend and the 3D analytical dashboard.
83
  ```bash
84
  pip install -r requirements.txt
85
+ python app.py # Gradio UI + FastAPI starts at http://localhost:7860
86
  ```
87
 
88
  ### 2. Connect Your Custom LLM Agent
89
+ Wire up your own agent configuration.
90
  ```bash
91
  export HF_TOKEN=hf_xxxxx
92
+ # Execute your script pointing to the local /step FASTApi environment
93
  python inference.py
94
  ```
95
 
 
99
  docker run -p 7860:7860 codebase-nav-env
100
  ```
101
 
102
+ ---
103
+
104
  ## 📊 Evaluation API Layers
105
 
106
+ The environment strictly communicates via a standard RESTful architecture.
107
+
108
+ | Endpoint | Method | Operational Description |
109
+ |----------|--------|-------------------------|
110
+ | `/step` | `POST` | Takes singular OpenEnv navigation action (`read_file`, `write_file`) |
111
+ | `/evaluate` | `GET` | Fetches deterministic baseline evaluation metrics |
112
+ | `/causal-probe` | `GET` | Generates directed acyclic graphs resolving true root-cause logic mapping |
113
+ | `/confidence` | `GET` | Emits behavioral-time confidence calibration algorithms |
114
+ | `/counterfactual` | `POST` | Triggers the 6 robustness ablation hallucination detection engine |
115
+
116
+ <br/>
117
 
118
+ > *Stop trusting the vibe. Force the cognition.*