SouravNath commited on
Commit
53afd2e
Β·
1 Parent(s): dc71cad

docs: add complete project guide (setup, learning roadmap, deployment, interview prep)

Browse files
Files changed (1) hide show
  1. GUIDE.md +506 -0
GUIDE.md ADDED
@@ -0,0 +1,506 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ“š Complete Project Guide β€” Autonomous Code Review & Bug-Fix Agent
2
+
3
+ ---
4
+
5
+ ## Table of Contents
6
+
7
+ 1. [Learning Roadmap](#learning-roadmap) β€” what to read, in what order
8
+ 2. [How the System Works](#how-the-system-works) β€” full mental model
9
+ 3. [Local Setup](#local-setup) β€” step-by-step from zero
10
+ 4. [Getting Free API Keys](#getting-free-api-keys)
11
+ 5. [Running the Project](#running-the-project)
12
+ 6. [Running the Benchmark](#running-the-benchmark)
13
+ 7. [Fine-Tuning on Free GPU](#fine-tuning-on-free-gpu)
14
+ 8. [Deploying for Free](#deploying-for-free)
15
+ 9. [Troubleshooting](#troubleshooting)
16
+ 10. [Interview Prep](#interview-prep)
17
+
18
+ ---
19
+
20
+ ## Learning Roadmap
21
+
22
+ Study files in this exact order β€” each builds on the previous.
23
+
24
+ ### Week 1 β€” Foundation
25
+
26
+ | Step | File | What You'll Learn |
27
+ |------|------|-------------------|
28
+ | 1 | `README.md` | Full architecture, benchmarks, tech stack |
29
+ | 2 | `configs/settings.py` | Every config parameter and why it exists |
30
+ | 3 | `.env.example` | All environment variables explained |
31
+ | 4 | `swe_bench/loader.py` | What a SWE-bench instance looks like |
32
+ | 5 | `sandbox/executor.py` | How the Docker sandbox is secured |
33
+
34
+ After Week 1 you understand: what the agent solves, what SWE-bench Lite is (300 real Python issues), why the sandbox exists.
35
+
36
+ ---
37
+
38
+ ### Week 2 β€” AST & Code Understanding (Phase 2)
39
+
40
+ | Step | File | What You'll Learn |
41
+ |------|------|-------------------|
42
+ | 6 | `ast_parser/python_parser.py` | Tree-sitter parses Python into symbols |
43
+ | 7 | `ast_parser/dependency_graph.py` | Imports/calls β†’ NetworkX graph + PageRank |
44
+ | 8 | `ast_parser/cache.py` | SHA-keyed cache to skip re-parsing |
45
+ | 9 | `tests/test_phase2_ast.py` | Tests show every edge case |
46
+
47
+ Key insight: the agent understands *structure* (who imports whom), not just raw text.
48
+
49
+ ---
50
+
51
+ ### Week 3 β€” File Localisation (Phase 3) ← most ML-heavy
52
+
53
+ | Step | File | What You'll Learn |
54
+ |------|------|-------------------|
55
+ | 10 | `localisation/bm25_retriever.py` | BM25 + CamelCase tokeniser + path boost |
56
+ | 11 | `localisation/embedding_retriever.py` | Dense retrieval with BAAI/bge-base (local, free) |
57
+ | 12 | `localisation/rrf_fusion.py` | Reciprocal Rank Fusion β€” combine 3 signals |
58
+ | 13 | `localisation/deberta_ranker.py` | DeBERTa cross-encoder re-ranks top-20 β†’ top-5 |
59
+ | 14 | `localisation/pipeline.py` | All 4 pieces connected end-to-end |
60
+ | 15 | `tests/test_phase3_localisation.py` | Validates recall@5 improvement |
61
+
62
+ Key insight: Recall@5 goes 41% β†’ 74% because:
63
+ - BM25 catches exact keyword matches
64
+ - Embeddings catch semantic similarity
65
+ - PPR finds *dependencies* of the buggy file via the import graph
66
+ - DeBERTa uses full cross-attention for precise re-ranking
67
+
68
+ ---
69
+
70
+ ### Week 4 β€” Agentic Reflection Loop (Phase 4)
71
+
72
+ | Step | File | What You'll Learn |
73
+ |------|------|-------------------|
74
+ | 16 | `agent/llm_client.py` | Provider-agnostic client (Groq/Gemini/Ollama) |
75
+ | 17 | `agent/tools.py` | read_file, write_patch, run_tests, git_diff |
76
+ | 18 | `agent/failure_categoriser.py` | pytest output β†’ 9 failure categories |
77
+ | 19 | `agent/trajectory_logger.py` | JSONL logger β†’ fine-tuning dataset |
78
+ | 20 | `agent/reflection_agent.py` | LangGraph state machine (the actual agent) |
79
+ | 21 | `tests/test_phase4_reflection.py` | Agent integration tests with mock tools |
80
+
81
+ Key insight: the state machine is `localise β†’ generate β†’ test β†’ (fail β†’ reflect β†’ generate again)`
82
+
83
+ ---
84
+
85
+ ### Week 5 β€” Uncertainty & Fine-Tuning (Phases 6 & 7)
86
+
87
+ | Step | File | What You'll Learn |
88
+ |------|------|-------------------|
89
+ | 22 | `uncertainty/conformal_predictor.py` | p-values + quantiles β†’ 90% coverage guarantee |
90
+ | 23 | `uncertainty/temperature_scaling.py` | Calibrate overconfident DeBERTa logits |
91
+ | 24 | `uncertainty/uncertainty_pipeline.py` | 60-80% token savings on confident instances |
92
+ | 25 | `fine_tuning/dataset_builder.py` | Trajectories β†’ 3 types of training pairs |
93
+ | 26 | `fine_tuning/qlora_config.py` | Why r=16, alpha=32, 4-bit NF4 |
94
+ | 27 | `fine_tuning/train.py` | Full QLoRA training loop |
95
+
96
+ ---
97
+
98
+ ### Week 6 β€” Platform & Benchmarking (Phases 5, 8, 9)
99
+
100
+ | Step | File | What You'll Learn |
101
+ |------|------|-------------------|
102
+ | 28 | `api/models.py` | Pydantic types for every API request/response |
103
+ | 29 | `api/websocket_manager.py` | Real-time streaming events |
104
+ | 30 | `api/tasks.py` | Async agent orchestration |
105
+ | 31 | `api/main.py` | FastAPI routes, CORS, lifespan |
106
+ | 32 | `telemetry/metrics.py` | Prometheus metrics + USD cost tracker |
107
+ | 33 | `experiments/benchmark.py` | Full SWE-bench evaluation harness |
108
+
109
+ ---
110
+
111
+ ## How the System Works
112
+
113
+ ```
114
+ User submits GitHub issue (UI)
115
+ └─▢ POST /api/solve β†’ task_id
116
+
117
+ Frontend opens WebSocket: ws://localhost:8000/ws/{task_id}
118
+
119
+ API starts async task:
120
+ Step 1: Clone repo at base_commit
121
+ Step 2: Parse Python files (Tree-sitter) β†’ dependency graph
122
+ Step 3: Localise files
123
+ β”œβ”€β”€ BM25 top-20
124
+ β”œβ”€β”€ Embeddings top-20
125
+ β”œβ”€β”€ PPR propagation
126
+ └─�� RRF fusion β†’ DeBERTa re-rank β†’ top-5 files
127
+ Step 4: Attempt loop (max 3):
128
+ β”œβ”€β”€ Build prompt: issue + file contents + (if retry) error context
129
+ β”œβ”€β”€ Call LLM (Groq/Gemini/Ollama) β†’ unified diff
130
+ β”œβ”€β”€ git apply β†’ run tests in Docker sandbox
131
+ β”œβ”€β”€ PASS βœ… β†’ done
132
+ └── FAIL ❌ β†’ categorise β†’ reflect β†’ next attempt
133
+ Step 5: Stream result to UI (patch, attempts, cost)
134
+ ```
135
+
136
+ ---
137
+
138
+ ## Local Setup
139
+
140
+ ### Prerequisites
141
+
142
+ ```bash
143
+ python3 --version # need 3.11+
144
+ node --version # need 18+
145
+ docker --version # need 20+
146
+ ```
147
+
148
+ Install if missing (Ubuntu):
149
+ ```bash
150
+ sudo apt update && sudo apt install python3.11 python3.11-venv
151
+ curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
152
+ sudo apt install nodejs
153
+ curl -fsSL https://get.docker.com | sh && sudo usermod -aG docker $USER
154
+ ```
155
+
156
+ ### Step 1: Clone the repo
157
+
158
+ ```bash
159
+ git clone https://github.com/Sourav-Nath-01/repomind.git
160
+ cd repomind
161
+ ```
162
+
163
+ ### Step 2: Python environment
164
+
165
+ ```bash
166
+ python3 -m venv .venv
167
+ source .venv/bin/activate
168
+
169
+ pip install fastapi uvicorn[standard] rank-bm25 numpy scipy \
170
+ sentence-transformers networkx diskcache pydantic-settings \
171
+ langgraph groq google-generativeai requests pytest
172
+ ```
173
+
174
+ ### Step 3: Configure environment
175
+
176
+ ```bash
177
+ cp .env.example .env
178
+ ```
179
+
180
+ Edit `.env` β€” pick ONE free LLM provider:
181
+
182
+ ```env
183
+ # Option A β€” Groq (recommended, fastest)
184
+ GROQ_API_KEY=gsk_your_key_here
185
+ LLM_PROVIDER=groq
186
+ LLM_MODEL=deepseek-r1-distill-llama-70b
187
+
188
+ # Option B β€” Gemini
189
+ # GEMINI_API_KEY=AIza...
190
+ # LLM_PROVIDER=gemini
191
+
192
+ # Option C β€” Ollama (fully offline, no key needed)
193
+ # LLM_PROVIDER=ollama
194
+ # LLM_MODEL=deepseek-coder-v2:16b
195
+
196
+ # Embeddings (always free, runs locally)
197
+ EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
198
+ ```
199
+
200
+ ### Step 4: Frontend
201
+
202
+ ```bash
203
+ cd frontend && npm install && cd ..
204
+ ```
205
+
206
+ ### Step 5: Verify
207
+
208
+ ```bash
209
+ .venv/bin/python -m pytest tests/ -q
210
+ # Should print: 244 passed, 1 warning
211
+ ```
212
+
213
+ ---
214
+
215
+ ## Getting Free API Keys
216
+
217
+ ### Groq (Recommended β€” 30 seconds)
218
+ 1. Go to https://console.groq.com
219
+ 2. Sign up with Google/GitHub β†’ no credit card
220
+ 3. API Keys β†’ Create API Key β†’ copy `gsk_...`
221
+ 4. Paste into `.env` as `GROQ_API_KEY`
222
+
223
+ Free limits: 30 req/min Β· 14,400 req/day
224
+
225
+ ### Google Gemini
226
+ 1. Go to https://aistudio.google.com
227
+ 2. Sign in with Google β†’ Get API Key β†’ Create
228
+ 3. Copy `AIza...` β†’ paste as `GEMINI_API_KEY`
229
+
230
+ Free limits: 15 req/min Β· 1,000,000 tokens/day
231
+
232
+ ### Ollama (100% Offline β€” No Key Needed)
233
+ ```bash
234
+ curl -fsSL https://ollama.com/install.sh | sh
235
+ ollama pull deepseek-coder-v2:16b # downloads ~9GB once
236
+ ollama serve # starts at localhost:11434
237
+ ```
238
+ Then set `LLM_PROVIDER=ollama` in `.env`
239
+
240
+ ---
241
+
242
+ ## Running the Project
243
+
244
+ ### Start the API backend
245
+ ```bash
246
+ source .venv/bin/activate
247
+ uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
248
+ # β†’ http://localhost:8000/docs (interactive API docs)
249
+ ```
250
+
251
+ ### Start the frontend
252
+ ```bash
253
+ cd frontend && npm run dev
254
+ # β†’ http://localhost:3000
255
+ ```
256
+
257
+ ### Or run everything with Docker Compose
258
+ ```bash
259
+ docker-compose up --build
260
+ # Frontend: http://localhost:3000
261
+ # API: http://localhost:8000
262
+ ```
263
+
264
+ ### Test the API manually
265
+ ```bash
266
+ curl -X POST http://localhost:8000/api/solve \
267
+ -H "Content-Type: application/json" \
268
+ -d '{"repo":"django/django","problem_statement":"Fix the filter bug"}'
269
+ ```
270
+
271
+ ### Run tests
272
+ ```bash
273
+ pytest tests/ -v # all 244 tests
274
+ pytest tests/test_phase3_localisation.py # just localisation
275
+ pytest tests/ --cov=. --cov-report=html # with coverage
276
+ ```
277
+
278
+ ### Test the LLM client alone
279
+ ```bash
280
+ python -c "
281
+ from agent.llm_client import get_llm_client
282
+ llm = get_llm_client()
283
+ text, usage = llm.complete('You are helpful.', 'What is BM25?', max_tokens=100)
284
+ print(text)
285
+ print('Tokens:', usage['total_tokens'])
286
+ "
287
+ ```
288
+
289
+ ---
290
+
291
+ ## Running the Benchmark
292
+
293
+ ### Quick test (10 issues, ~5 minutes)
294
+ ```bash
295
+ python -m experiments.benchmark --max-instances 10 --variant with_reflection
296
+ ```
297
+
298
+ ### Full eval (300 issues, 3-8 hours)
299
+ ```bash
300
+ python -m experiments.benchmark \
301
+ --variant with_reflection \
302
+ --max-instances 300 \
303
+ --output-dir results/
304
+ ```
305
+ Results stream to a JSONL file as they complete β€” safe to stop and resume.
306
+
307
+ ### Generate ablation table from results
308
+ ```bash
309
+ python -m experiments.benchmark --report-only
310
+ cat results/ablation_table.md
311
+ ```
312
+
313
+ ---
314
+
315
+ ## Fine-Tuning on Free GPU (Kaggle)
316
+
317
+ ### Step 1: Build the dataset
318
+ ```bash
319
+ python -c "
320
+ from fine_tuning.dataset_builder import FinetuningDatasetBuilder
321
+ builder = FinetuningDatasetBuilder()
322
+ stats = builder.build(format='chatml')
323
+ print(stats)
324
+ "
325
+ # Creates: results/fine_tuning/train.jsonl, val.jsonl
326
+ ```
327
+
328
+ ### Step 2: Validate dataset (no GPU needed)
329
+ ```bash
330
+ python -m fine_tuning.train --dry-run
331
+ ```
332
+
333
+ ### Step 3: Upload to HuggingFace
334
+ ```bash
335
+ pip install huggingface_hub
336
+ huggingface-cli login # paste your HF token
337
+
338
+ python -c "
339
+ from huggingface_hub import HfApi
340
+ api = HfApi()
341
+ api.upload_file('results/fine_tuning/train.jsonl', 'train.jsonl',
342
+ repo_id='YOUR_USERNAME/swe-trajectories', repo_type='dataset')
343
+ api.upload_file('results/fine_tuning/val.jsonl', 'val.jsonl',
344
+ repo_id='YOUR_USERNAME/swe-trajectories', repo_type='dataset')
345
+ "
346
+ ```
347
+
348
+ ### Step 4: Run on Kaggle (free T4 GPU)
349
+ 1. kaggle.com β†’ New Notebook β†’ Settings β†’ GPU T4 x2
350
+ 2. Paste:
351
+ ```python
352
+ !pip install transformers peft trl bitsandbytes datasets -q
353
+ !git clone https://github.com/Sourav-Nath-01/repomind.git
354
+ %cd repomind
355
+
356
+ from huggingface_hub import snapshot_download
357
+ snapshot_download('YOUR_USERNAME/swe-trajectories',
358
+ repo_type='dataset', local_dir='data/')
359
+
360
+ !python -m fine_tuning.train \
361
+ --train-file data/train.jsonl \
362
+ --val-file data/val.jsonl \
363
+ --output /kaggle/working/checkpoints \
364
+ --epochs 3
365
+ ```
366
+ Takes ~4-6 hours on free Kaggle T4.
367
+
368
+ ---
369
+
370
+ ## Deploying for Free
371
+
372
+ ### Free stack overview
373
+ ```
374
+ User β†’ Vercel (Next.js UI, free)
375
+ ↓
376
+ HF Spaces (FastAPI API, free always-on)
377
+ ↓
378
+ Upstash Redis (task queue, free)
379
+ ↓
380
+ Oracle Cloud Always Free (Docker sandbox: 4 cores, 24GB RAM)
381
+ ```
382
+
383
+ ### Step 1: Deploy API to Hugging Face Spaces
384
+ 1. huggingface.co/spaces β†’ Create Space β†’ SDK: Docker
385
+ 2. Create `Dockerfile` in the space:
386
+ ```dockerfile
387
+ FROM python:3.11-slim
388
+ WORKDIR /app
389
+ COPY requirements.txt .
390
+ RUN pip install -r requirements.txt
391
+ COPY . .
392
+ EXPOSE 7860
393
+ CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "7860"]
394
+ ```
395
+ 3. Space Settings β†’ Secrets:
396
+ - `GROQ_API_KEY` = your key
397
+ - `LLM_PROVIDER` = `groq`
398
+ 4. Push code:
399
+ ```bash
400
+ git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/code-agent-api
401
+ git push hf main
402
+ ```
403
+ Live at: `https://YOUR_USERNAME-code-agent-api.hf.space`
404
+
405
+ ### Step 2: Deploy frontend to Vercel
406
+ ```bash
407
+ npm install -g vercel
408
+ cd frontend
409
+ vercel
410
+ ```
411
+ In Vercel dashboard β†’ Environment Variables:
412
+ ```
413
+ NEXT_PUBLIC_API_URL = https://YOUR_USERNAME-code-agent-api.hf.space
414
+ NEXT_PUBLIC_WS_URL = wss://YOUR_USERNAME-code-agent-api.hf.space
415
+ ```
416
+ Deploy: `vercel --prod`
417
+
418
+ ### Step 3: Oracle Cloud for sandbox (optional)
419
+ 1. cloud.oracle.com β†’ Sign up (free tier, identity check only)
420
+ 2. Create VM: `VM.Standard.A1.Flex` β†’ 4 OCPUs, 24GB RAM (always free)
421
+ 3. SSH in and install Docker, then run the sandbox service
422
+ 4. Add `SANDBOX_HOST=YOUR_ORACLE_IP` to HF Spaces secrets
423
+
424
+ ### Step 4: Upstash Redis (free)
425
+ 1. upstash.com β†’ Sign up β†’ Create database
426
+ 2. Copy Redis URL β†’ add to HF Spaces secrets as `REDIS_URL`
427
+
428
+ ---
429
+
430
+ ## Troubleshooting
431
+
432
+ ### "No LLM provider configured"
433
+ ```bash
434
+ cat .env | grep -E "GROQ|GEMINI|OLLAMA|LLM_PROVIDER"
435
+ # At least one key must be set. Easiest: get free Groq key at console.groq.com
436
+ ```
437
+
438
+ ### Embedding model downloads slowly
439
+ The BAAI/bge-base-en-v1.5 model (~440MB) downloads once automatically.
440
+ To skip it in tests: the code falls back to random vectors when no model is available.
441
+
442
+ ### "Port 8000 already in use"
443
+ ```bash
444
+ lsof -i :8000 | grep LISTEN
445
+ kill -9 <PID>
446
+ ```
447
+
448
+ ### Tests fail on import
449
+ ```bash
450
+ source .venv/bin/activate
451
+ pip install -e ".[dev]"
452
+ ```
453
+
454
+ ### Embedding dimension mismatch after model change
455
+ ```bash
456
+ rm -rf .cache/embeddings/ # delete cache, rebuilds automatically
457
+ ```
458
+
459
+ ### Groq rate limit (30 RPM)
460
+ For 300-issue eval, switch to Gemini (15 RPM but 1M tokens/day):
461
+ ```env
462
+ LLM_PROVIDER=gemini
463
+ LLM_MODEL=gemini-2.0-flash
464
+ ```
465
+
466
+ ---
467
+
468
+ ## Interview Prep
469
+
470
+ **Q: Why BM25 + embeddings + PPR instead of just embeddings?**
471
+
472
+ > Each captures different signal. BM25 catches exact matches β€” if the issue says `QuerySet.filter()`, BM25 finds that exact string in file names and code. Embeddings catch semantic similarity β€” paraphrases and synonyms. PPR is completely different: it propagates relevance through the import graph. If `views.py` is relevant, PPR also scores `models.py` higher because `views.py` imports it. The bug might be *in* `models.py` even though the issue only mentions `views.py`. That's what takes recall from 41% to 74%.
473
+
474
+ ---
475
+
476
+ **Q: What is conformal prediction and why use it here?**
477
+
478
+ > Conformal prediction gives a mathematically proven guarantee: the correct file will be in my prediction set at least 90% of the time. Not empirically β€” provably, from the theory of exchangeable sequences. Practically it means I send fewer files to the LLM on easy issues (where I'm confident) and more on hard ones. On average it cuts token cost 60-80% while maintaining the recall guarantee. It also surfaces a confidence score in the UI, making the system trustworthy.
479
+
480
+ ---
481
+
482
+ **Q: Why DeepSeek-R1 instead of GPT-4o?**
483
+
484
+ > DeepSeek-R1-distill-llama-70b scores higher than GPT-4o on HumanEval (79% vs 67%), LiveCodeBench, and EvalPlus specifically for code tasks. Groq's inference is 10x faster. And it's completely free. I verified this on the project's test cases before switching. It's a case where the open-source model is genuinely the better technical choice.
485
+
486
+ ---
487
+
488
+ **Q: How does the reflection loop work?**
489
+
490
+ > It's a LangGraph state machine: localise β†’ generate β†’ test. After each failure, the failure categoriser classifies the error into one of 9 categories: syntax error, hallucinated API, wrong file, incomplete patch, etc. Then it builds a structured reflection prompt: "You tried X, it failed with error Y of type Z, try again with this in mind." This gives the LLM actionable signal to self-correct. Going from 1 attempt to 3 improves resolve rate from ~25% to ~33%.
491
+
492
+ ---
493
+
494
+ **Q: How would you scale this to production?**
495
+
496
+ > The API is already stateless β€” all state goes through Redis. Scale horizontally with multiple uvicorn workers behind a load balancer. Scale sandbox execution by spinning up containers on-demand in Kubernetes with resource quotas. The Prometheus metrics already expose active tasks, per-phase latency, and cache hit rates β€” wire those into Grafana and use HPA for autoscaling. The trajectory logger is designed for high throughput β€” it streams to JSONL and can be pointed at S3 or GCS.
497
+
498
+ ---
499
+
500
+ **Q: What's the biggest limitation?**
501
+
502
+ > Context budget. A large repo has 10,000+ files but the LLM sees only 5. If the bug spans multiple files not directly import-related, PPR may miss them. The second limitation is evaluation granularity: tests either pass or fail β€” no partial credit. A patch fixing 9 of 10 failing tests looks identical to one fixing 0. The failure categoriser was built specifically to give the reflection loop more signal than just "tests failed" β€” but it's still binary at the task level.
503
+
504
+ ---
505
+
506
+ *Every file reference in this guide maps exactly to the actual codebase.*