codebase-nav-env / README.md
Chirag0123's picture
docs: highly refine README formatting with technical badges and polished narrative structure
d505d26
metadata
title: Codebase Navigation Repair OpenEnv
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 7860
license: mit
tags:
  - openenv
  - reinforcement-learning
  - coding-agent
3D Visualizer Architecture Trace

πŸ” Codebase Navigation Repair OpenEnv

The ultimate diagnostic environment to end "Vibe Coding." Making AI coding agents structural, testable, and deeply debuggable.

Hugging Face Space Python Version FastAPI ThreeJs Docker


🚨 The End of "Vibe Coding"

We are officially in the era of Vibe Coding. The volume of AI-generated code is exploding, yet developers and top-tier AI Agents (Copilot, Devin, Claude Code) are increasingly writing and submitting code blindly.

Most agents don't actually know where the issue exists, what the code flow looks like, or how the function dependencies cascade. Current developer benchmarks only evaluate the final outcome. They do not evaluate cognition.

When an AI agent claims "I fixed the bug," how do you verify how it did it? Did it actually navigate to the source of the crash, trace the logical data flow, or did it just randomly change syntax until a test arbitrarily turned green?

πŸ’‘ Our Solution: 3D Visualization & Deep Analytic Execution

This project is not just another benchmarkβ€”it is a Full-Stack Diagnostic Platform. It actively forces autonomous AI agents to explore an unknown Python repository file-by-file through a strictly monitored API, and then exposes their exact cognitive layout.

By tracking structural behavior instead of just binary pass/fail outcomes, our platform gives researchers, engineers, and Hackathon judges unprecedented visibility into an AI's actual thought process and navigation footprint.


🧠 Core Intelligence Modules (v4.0)

Unlike standard environments, we evaluate how the agent works using proprietary, research-grade engines built specifically for this platform:

🧩 Module 🎯 What It Does (The Cure to Vibe Coding)
3D Trace Visualizer A seamless, fully-interpolated 3D engine that renders repos as geometric maps (Cubes for Source, Prisms for Tests). Visualizes agent navigation traces via glowing Catmull-Rom tube paths.
Causal Graph Probe Detects "Shortcut Learning". Maps a Directed Acyclic Graph to verify if the agent actually read the test file, traced its imported module, and structurally fixed the root causeβ€”or if it guessed blindly.
Confidence Calibrator Infers the agent's behavioral confidence entirely based on real-time execution speeds, rewrite hesitation frequencies, and test verification ratios.
Counterfactual Engine Subjects the agent to 6 robustness ablation tests (mutating the environment behind the scenes) to determine if its strategy relies on brittle memorization.
Episodic Memory Bank A cross-episode Retrieval-Augmented Generation (RAG) store capturing procedural mistakes (e.g., failing to run tests before committing) to dynamically auto-inject hard lessons into future iteration system prompts.

βš™οΈ How It Works (The OpenEnv Standard)

  1. Blind Start: Agent loads an unfamiliar environment variant -> sees the repository file tree (NOT contents).
  2. Step Budgeting: Agent explores variables and reads files one at a time (costing strictly penalized exploration steps).
  3. Flow Navigation: Agent navigates architecture dependencies and identifies structural vulnerabilities.
  4. Execution: Agent acts and writes the updated architectural fix.
  5. Verification: Agent verifies functionality through containerized pytest execution loops safely within the RL boundary.
  6. Dynamic Scoring: Environment scores the agent's complete step trajectory across 6 independent research axes.

πŸš€ Quick Start

1. Run Locally (No Docker)

Spin up the backend and the 3D analytical dashboard.

pip install -r requirements.txt
python app.py                    # Gradio UI + FastAPI starts at http://localhost:7860

2. Connect Your Custom LLM Agent

Wire up your own agent configuration.

export HF_TOKEN=hf_xxxxx
# Execute your script pointing to the local /step FASTApi environment
python inference.py

3. Deploy via Docker

docker build -t codebase-nav-env .
docker run -p 7860:7860 codebase-nav-env

πŸ“Š Evaluation API Layers

The environment strictly communicates via a standard RESTful architecture.

Endpoint Method Operational Description
/step POST Takes singular OpenEnv navigation action (read_file, write_file)
/evaluate GET Fetches deterministic baseline evaluation metrics
/causal-probe GET Generates directed acyclic graphs resolving true root-cause logic mapping
/confidence GET Emits behavioral-time confidence calibration algorithms
/counterfactual POST Triggers the 6 robustness ablation hallucination detection engine

Stop trusting the vibe. Force the cognition.