Spaces:

agentDebugger
/

AgentDebugger-training-v3

Running

App Files Files Community

AgentDebugger-training-v3 / HANDOVER.md

shank

Revert "Fix: Dockerfile"

dc7eb3f 12 days ago

preview code

raw

history blame

8.06 kB

AgentDebuggerEnv — Project Handover

What This Project Is

A GRPO-trained LLM (Qwen2.5-Coder-7B-Instruct) that learns to debug Python code through structured hypothesis-driven reasoning. Submitted to the Meta + PyTorch + HuggingFace OpenEnv Hackathon.

Repo & Remotes

Remote	URL
GitHub (source of truth)	https://github.com/shasshaank/meta_hackthon.git
HF Training Space	https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v2
HF Trained Model	https://huggingface.co/shashaank0707/AgentDebugger-trained

Push to GitHub first, then to HF Space if needed:

git push origin main
git push space main --force   # space remote = HF training space

The space remote URL includes your HF token:

https://shashaank0707:YOUR_HF_TOKEN@huggingface.co/spaces/shashaank0707/AgentDebugger-training-v2

Project Structure

meta_hackathon/
├── app.py                      # Gradio training monitor — launched by HF Space SDK
├── training/
│   └── train_grpo.py           # Main training script (GRPO via TRL)
├── server/
│   ├── reward_calculator.py    # Multi-component reward (format, hypothesis, fix, semantic)
│   ├── models.py               # parse_agent_output() — parses structured LLM output
│   └── app.py                  # FastAPI server (for the inference/env Space, not training)
├── data/
│   ├── bugs_tier1.jsonl        # 9 easy bugs (used steps 0–150)
│   ├── bugs_tier2.jsonl        # 31 medium bugs (added at step 150)
│   ├── bugs_tier3.jsonl        # 21 hard bugs (added at step 350 → was 600)
│   └── generate_bugs.py        # Script that generated the bug datasets
├── requirements.txt            # HF Space deps (gradio[oauth,mcp]==6.13.0, cu121 torch)
├── requirements_kaggle.txt     # Kaggle/RunPod deps (no torch pin, bitsandbytes==0.45.3)
├── inference.py                # Inference wrapper for evaluation
├── Dockerfile                  # For the inference/env Space (not the training space)
└── README.md                   # HF Space config header (sdk: gradio, app_file: app.py)

Dependency Versions (locked — do not change without testing)

Package	Version	Why pinned
`trl`	`0.14.0`	First version with `GRPOTrainer` + `GRPOConfig`
`pydantic`	`2.12.5`	Only version satisfying both gradio base AND gradio[mcp] constraints
`gradio`	`6.13.0[oauth,mcp]`	HF Space builder requires extras in one install pass
`bitsandbytes`	`0.45.3` (Kaggle) / `0.43.3` (HF Space cu121)	0.45.3 has CUDA 12.x binaries; 0.43.3 works with cu121
`transformers`	`4.46.3`	Tested with TRL 0.14.0
`torch`	`2.5.1+cu121` (HF Space) / pre-installed (Kaggle)

GRPOConfig param name: max_completion_length (NOT max_new_tokens — that's the old name, breaks on 0.14.0)

Training Script — Key Design Decisions

GPU Auto-Detection (train_grpo.py ~line 260)

The script detects GPU at runtime and sets all hyperparams automatically:

GPU	dtype	batch	grad_accum	num_gen	max_comp	lora_r
A100 40GB+	bfloat16	2	4	8	256	16
V100 32GB	float16	1	8	6	220	12
T4 / ≤16GB	float16	1	8	4	160	8

Critical: P100 is NOT supported — PyTorch 2.x dropped sm_60 support. Use T4 instead.

Curriculum

Steps 0–150: Tier 1 bugs only (9 bugs)
Steps 150–350: Tier 1 + Tier 2 (40 bugs)
Steps 350+: All tiers (61 bugs)

Reward Components (server/reward_calculator.py)

Component	Weight	What it measures
format_compliance	0.10	All 5 fields present (OBSERVATION/HYPOTHESIS/CONFIDENCE/ACTION/DETAIL)
hypothesis_quality	0.20	Length + references specific variable names
localization	0.15	Correct function/line identified
fix_quality	0.35	Tests pass on proposed fix
semantic_similarity	0.10	Similarity to canonical fix
efficiency_potential	0.10	Potential-based shaping (Ibrahim et al. 2024)

Required Output Format

OBSERVATION: [specific observations with line numbers]
HYPOTHESIS: [2+ sentences explaining root cause with variable names]
CONFIDENCE: [low | medium | high]
ACTION: [inspect_lines | run_tests | propose_fix | request_context | give_up]
DETAIL: [complete fixed function code if propose_fix, else details]

Running Training

On Kaggle (T4 — free):

# Cell 1 — install
!pip install -q wandb==0.18.7 datasets==3.0.2 transformers==4.46.3 \
    accelerate==1.0.1 trl==0.14.0 bitsandbytes==0.45.3 peft==0.13.2

# Cell 2 — clone + secrets
from kaggle_secrets import UserSecretsClient
import os
secrets = UserSecretsClient()
os.environ["WANDB_API_KEY"] = secrets.get_secret("WANDB_API_KEY")
os.environ["HF_TOKEN"]      = secrets.get_secret("HF_TOKEN")
!git clone https://github.com/shasshaank/meta_hackthon.git /kaggle/working/repo
%cd /kaggle/working/repo

# Cell 3 — train (streams output live)
import subprocess, sys
proc = subprocess.Popen(
    [sys.executable, "training/train_grpo.py"],
    stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
    text=True, bufsize=1, cwd="/kaggle/working/repo"
)
for line in proc.stdout:
    print(line, end="", flush=True)
proc.wait()

# Cell 4 — save outputs after training
import shutil
shutil.copytree("/kaggle/working/repo/checkpoints", "/kaggle/working/checkpoints", dirs_exist_ok=True)

Kaggle secrets needed: WANDB_API_KEY, HF_TOKEN Kaggle GPU: T4 x1 (NOT P100 — incompatible with modern PyTorch) Expected time: ~8–10 hours for 500 steps (default max_steps=500)

On RunPod (A100 — ~$1.09/hr):

git clone https://github.com/shasshaank/meta_hackthon.git && cd meta_hackthon
pip install -q wandb==0.18.7 datasets==3.0.2 transformers==4.46.3 \
    accelerate==1.0.1 trl==0.14.0 bitsandbytes==0.45.3 peft==0.13.2
WANDB_API_KEY=xxx HF_TOKEN=xxx python training/train_grpo.py

Expected time: ~3–4 hours for 1000 steps on A100 40GB

Resume from checkpoint:

python training/train_grpo.py --resume ./checkpoints/checkpoint-400

Local sanity check (no GPU):

python training/train_grpo.py --test-local

HF Space Setup (training monitor)

The training Space (AgentDebugger-training-v2) is a Gradio app that:

On startup, spawns training/train_grpo.py in a background thread
Shows a live training log in the UI, auto-refreshing every 30s

Required Space secrets:

WANDB_API_KEY
HF_TOKEN

Push to Space:

git remote set-url space https://shashaank0707:YOUR_HF_TOKEN@huggingface.co/spaces/shashaank0707/AgentDebugger-training-v2
git push space main --force

Known Issues Fixed (do not revert)

Issue	Fix
`ImportError: cannot import name 'GRPOTrainer'`	`trl==0.12.2` → `trl==0.14.0`
`TypeError: GRPOConfig got unexpected keyword 'max_new_tokens'`	renamed to `max_completion_length`
`pydantic` conflict with `gradio[mcp]`	`pydantic==2.10.6` → `2.12.5`
`P100 not supported by PyTorch 2.x`	Switch to T4 on Kaggle
`bitsandbytes CUDA binary not found`	`bitsandbytes==0.43.3` → `0.45.3` on Kaggle
`unsloth` CUDA driver crash on HF A100	Replaced with `bitsandbytes + peft`
`gradio every=` deprecation	Replaced with `gr.Timer(value=30)`

W&B Dashboard

https://wandb.ai/shashaankjain07-keshav-memorial-college-of-law/AgentDebuggerEnv

Training runs appear here automatically when WANDB_API_KEY is set.

What's Left To Do

Finish training — 500–1000 steps, model pushes to HF Hub automatically on completion
Verify trained model — run inference.py against the trained model checkpoint
Update HF Space README — change curriculum description to match actual step boundaries (150/350)
Submission — ensure the inference/env Space (AgentDebugger-env) is live and healthy for judging