Spaces:
Running
Running
Commit Β·
bd7df56
1
Parent(s): 8291e3d
docs: add complete improvement roadmap for top-tier AIML resume
Browse files
GUIDE.md
CHANGED
|
@@ -4,16 +4,304 @@
|
|
| 4 |
|
| 5 |
## Table of Contents
|
| 6 |
|
| 7 |
-
1. [
|
| 8 |
-
2. [
|
| 9 |
-
3. [
|
| 10 |
-
4. [
|
| 11 |
-
5. [
|
| 12 |
-
6. [Running the
|
| 13 |
-
7. [
|
| 14 |
-
8. [
|
| 15 |
-
9. [
|
| 16 |
-
10. [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
---
|
| 19 |
|
|
|
|
| 4 |
|
| 5 |
## Table of Contents
|
| 6 |
|
| 7 |
+
1. [**π How to Improve This Project**](#how-to-improve-this-project) β **Start here**
|
| 8 |
+
2. [Learning Roadmap](#learning-roadmap) β what to read, in what order
|
| 9 |
+
3. [How the System Works](#how-the-system-works) β full mental model
|
| 10 |
+
4. [Local Setup](#local-setup) β step-by-step from zero
|
| 11 |
+
5. [Getting Free API Keys](#getting-free-api-keys)
|
| 12 |
+
6. [Running the Project](#running-the-project)
|
| 13 |
+
7. [Running the Benchmark](#running-the-benchmark)
|
| 14 |
+
8. [Fine-Tuning on Free GPU](#fine-tuning-on-free-gpu)
|
| 15 |
+
9. [Deploying for Free](#deploying-for-free)
|
| 16 |
+
10. [Troubleshooting](#troubleshooting)
|
| 17 |
+
11. [Interview Prep](#interview-prep)
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## How to Improve This Project
|
| 22 |
+
|
| 23 |
+
> Current grade: **B+** for top tech AIML roles.
|
| 24 |
+
> Target grade: **A / A+** β follow these steps in priority order.
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
### Priority 1 β Run the Real Benchmark β (Biggest Impact)
|
| 29 |
+
|
| 30 |
+
**Why it matters:** Right now, "30β42% resolve rate" is just the SWE-bench SOTA range β not a number you actually measured. Interviewers will ask *"what did YOU get?"* and you won't have an answer. Fix this first.
|
| 31 |
+
|
| 32 |
+
**What to do:**
|
| 33 |
+
|
| 34 |
+
```bash
|
| 35 |
+
# Run on 50 issues first (~30 minutes, free with Groq)
|
| 36 |
+
python -m experiments.benchmark \
|
| 37 |
+
--variant with_reflection \
|
| 38 |
+
--max-instances 50 \
|
| 39 |
+
--output-dir results/benchmark_50/
|
| 40 |
+
|
| 41 |
+
# Then check your actual resolve rate
|
| 42 |
+
python -m experiments.benchmark --report-only --results-dir results/benchmark_50/
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
**What to add to README after running:**
|
| 46 |
+
```markdown
|
| 47 |
+
## Benchmark Results (measured)
|
| 48 |
+
|
| 49 |
+
| Variant | Instances | Resolve Rate | Recall@5 | Avg Time |
|
| 50 |
+
|----------------------|-----------|--------------|----------|----------|
|
| 51 |
+
| No reflection (k=1) | 50 | XX.X% | XX.X% | XXs |
|
| 52 |
+
| With reflection (k=3)| 50 | XX.X% | XX.X% | XXs |
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
**Resume bullet point upgrade:**
|
| 56 |
+
```
|
| 57 |
+
Before: "30β42% resolve rate on SWE-bench Lite"
|
| 58 |
+
After: "Achieved 34.2% resolve rate on SWE-bench Lite (50 issues),
|
| 59 |
+
+9% over no-reflection baseline"
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
**Time required:** 1β2 hours (mostly waiting for API calls)
|
| 63 |
+
**Cost:** Free (Groq rate limits allow ~100 issues/day)
|
| 64 |
+
|
| 65 |
+
---
|
| 66 |
+
|
| 67 |
+
### Priority 2 β Run Ablation Study ββ
|
| 68 |
+
|
| 69 |
+
**Why it matters:** An ablation study shows you think like a researcher, not just a developer. It proves each component you built actually contributes.
|
| 70 |
+
|
| 71 |
+
**What to do:** Run the benchmark 3 times with different configs:
|
| 72 |
+
|
| 73 |
+
```bash
|
| 74 |
+
# Variant A: BM25 only (no embeddings, no PPR)
|
| 75 |
+
python -m experiments.benchmark --variant bm25_only --max-instances 50
|
| 76 |
+
|
| 77 |
+
# Variant B: BM25 + embeddings, no PPR
|
| 78 |
+
python -m experiments.benchmark --variant no_ppr --max-instances 50
|
| 79 |
+
|
| 80 |
+
# Variant C: Full pipeline (BM25 + embeddings + PPR + DeBERTa)
|
| 81 |
+
python -m experiments.benchmark --variant with_reflection --max-instances 50
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
**Expected result table (fill in your real numbers):**
|
| 85 |
+
|
| 86 |
+
| Component | Recall@5 | Resolve Rate |
|
| 87 |
+
|------------------------------------|----------|--------------|
|
| 88 |
+
| BM25 only | ~41% | ~18% |
|
| 89 |
+
| BM25 + Embeddings | ~58% | ~24% |
|
| 90 |
+
| BM25 + Embeddings + PPR | ~72% | ~30% |
|
| 91 |
+
| + DeBERTa reranker + Reflection | ~74% | ~34% |
|
| 92 |
+
|
| 93 |
+
**This table = your most powerful interview answer.**
|
| 94 |
+
|
| 95 |
+
**Time required:** 3β4 hours
|
| 96 |
+
**Cost:** Free (Groq)
|
| 97 |
+
|
| 98 |
+
---
|
| 99 |
+
|
| 100 |
+
### Priority 3 β Fine-Tune a Custom Model βββ
|
| 101 |
+
|
| 102 |
+
**Why it matters:** "I called the Groq API" β "I trained my own model" is the biggest single upgrade. This is what separates ML engineers from developers who use LLMs.
|
| 103 |
+
|
| 104 |
+
**Step-by-step:**
|
| 105 |
+
|
| 106 |
+
**Step 3a: Collect trajectories (run the agent on 100+ issues)**
|
| 107 |
+
```bash
|
| 108 |
+
python -m experiments.benchmark --max-instances 100 --output-dir results/
|
| 109 |
+
# Each run saves a trajectory to results/trajectories/*.jsonl
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
**Step 3b: Build fine-tuning dataset from trajectories**
|
| 113 |
+
```python
|
| 114 |
+
from fine_tuning.dataset_builder import FinetuningDatasetBuilder
|
| 115 |
+
builder = FinetuningDatasetBuilder()
|
| 116 |
+
stats = builder.build(format='chatml')
|
| 117 |
+
print(stats)
|
| 118 |
+
# Creates: results/fine_tuning/train.jsonl (~80%), val.jsonl (~20%)
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
**Step 3c: Validate dataset (no GPU needed)**
|
| 122 |
+
```bash
|
| 123 |
+
python -m fine_tuning.train --dry-run
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
**Step 3d: Train on Kaggle (free T4 GPU β 12 hours/week)**
|
| 127 |
+
1. Go to kaggle.com β New Notebook β Accelerator β GPU T4 x2
|
| 128 |
+
2. Run:
|
| 129 |
+
```python
|
| 130 |
+
!pip install transformers peft trl bitsandbytes datasets -q
|
| 131 |
+
!git clone https://github.com/Sourav-Nath-01/repomind.git
|
| 132 |
+
%cd repomind
|
| 133 |
+
!python -m fine_tuning.train --model deepseek-ai/deepseek-coder-6.7b-instruct \
|
| 134 |
+
--epochs 3 --output /kaggle/working/checkpoints
|
| 135 |
+
```
|
| 136 |
+
3. Takes ~4β6 hours on free Kaggle T4
|
| 137 |
+
|
| 138 |
+
**Step 3e: Upload fine-tuned adapter to HuggingFace**
|
| 139 |
+
```python
|
| 140 |
+
from huggingface_hub import HfApi
|
| 141 |
+
api = HfApi()
|
| 142 |
+
api.upload_folder(
|
| 143 |
+
folder_path="/kaggle/working/checkpoints/lora_adapter",
|
| 144 |
+
repo_id="SouravNath/repomind-coder-7b-lora",
|
| 145 |
+
repo_type="model"
|
| 146 |
+
)
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
**Step 3f: Compare fine-tuned vs base model on benchmark**
|
| 150 |
+
```bash
|
| 151 |
+
# Run benchmark with your fine-tuned model
|
| 152 |
+
LLM_MODEL=SouravNath/repomind-coder-7b-lora \
|
| 153 |
+
python -m experiments.benchmark --max-instances 50
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
**Resume bullet point:**
|
| 157 |
+
```
|
| 158 |
+
"Fine-tuned DeepSeek-Coder-7B with QLoRA (r=16) on 500+ agent trajectories,
|
| 159 |
+
improving resolve rate from 34% β 41% over the base model"
|
| 160 |
+
```
|
| 161 |
+
|
| 162 |
+
**Time required:** 2β3 days (data collection + training + evaluation)
|
| 163 |
+
**Cost:** Free (Kaggle GPU quota)
|
| 164 |
+
|
| 165 |
+
---
|
| 166 |
+
|
| 167 |
+
### Priority 4 β Write a Technical Report (2β3 pages)
|
| 168 |
+
|
| 169 |
+
**Why it matters:** It positions you as research-aware. Even without a paper, a well-written report shows scientific thinking. Put it in the repo as `REPORT.md` and link it from README.
|
| 170 |
+
|
| 171 |
+
**Sections to include:**
|
| 172 |
+
|
| 173 |
+
```markdown
|
| 174 |
+
# RepoMind: Autonomous Code Repair with Graph-Guided Localisation
|
| 175 |
+
|
| 176 |
+
## Abstract (100 words)
|
| 177 |
+
We present RepoMind, an autonomous code repair system that combines
|
| 178 |
+
BM25 retrieval, dense embeddings, and Personalised PageRank graph
|
| 179 |
+
propagation to localise bugs in real-world Python repositories, followed
|
| 180 |
+
by LLM-based patch generation with iterative reflection.
|
| 181 |
+
|
| 182 |
+
## 1. Introduction
|
| 183 |
+
- Problem: Software bugs cost X hours/year
|
| 184 |
+
- SWE-bench Lite as evaluation benchmark
|
| 185 |
+
- Our contribution: PPR + RRF fusion localisation pipeline
|
| 186 |
+
|
| 187 |
+
## 2. Method
|
| 188 |
+
- 2.1 AST Parsing + Dependency Graph
|
| 189 |
+
- 2.2 File Localisation: BM25, Embeddings, PPR, RRF Fusion
|
| 190 |
+
- 2.3 Patch Generation + Reflection Loop
|
| 191 |
+
- 2.4 QLoRA Fine-Tuning Pipeline
|
| 192 |
+
|
| 193 |
+
## 3. Experiments
|
| 194 |
+
- 3.1 Ablation study results table
|
| 195 |
+
- 3.2 Comparison with SWE-agent baseline
|
| 196 |
+
- 3.3 Fine-tuned model results (if done)
|
| 197 |
+
|
| 198 |
+
## 4. Limitations & Future Work
|
| 199 |
+
## 5. References
|
| 200 |
+
```
|
| 201 |
+
|
| 202 |
+
**Time required:** 4β6 hours
|
| 203 |
+
**Cost:** Free
|
| 204 |
+
|
| 205 |
+
---
|
| 206 |
+
|
| 207 |
+
### Priority 5 β Add a Comparison to SWE-agent Baseline
|
| 208 |
+
|
| 209 |
+
**Why it matters:** Shows scientific thinking β "my system vs the prior art."
|
| 210 |
+
|
| 211 |
+
```bash
|
| 212 |
+
# SWE-agent uses GPT-4 + shell tools. Cite their paper's resolve rate:
|
| 213 |
+
# SWE-agent (Jimenez et al., 2024): 12.5% on SWE-bench Lite with GPT-4
|
| 214 |
+
# Our system: ~34% (because we have better localisation)
|
| 215 |
+
```
|
| 216 |
+
|
| 217 |
+
**Add this table to README:**
|
| 218 |
+
|
| 219 |
+
| System | Model | Resolve Rate | Localisation |
|
| 220 |
+
|-----------------------------|---------------|--------------|--------------|
|
| 221 |
+
| SWE-agent (2024) | GPT-4 | 12.5% | Shell grep |
|
| 222 |
+
| Devin (2024) | Proprietary | 13.8% | β |
|
| 223 |
+
| **RepoMind (ours)** | Llama-3.3-70B | **XX.X%** | BM25+PPR+RRF |
|
| 224 |
+
| **RepoMind + fine-tuned** | Custom 7B | **XX.X%** | BM25+PPR+RRF |
|
| 225 |
+
|
| 226 |
+
---
|
| 227 |
+
|
| 228 |
+
### Priority 6 β Improve the Localisation Pipeline
|
| 229 |
+
|
| 230 |
+
**Current gap:** DeBERTa reranker in `localisation/deberta_ranker.py` may not be running in production (HF Spaces has limited RAM).
|
| 231 |
+
|
| 232 |
+
**What to check:**
|
| 233 |
+
```bash
|
| 234 |
+
# Test if DeBERTa is actually being used
|
| 235 |
+
grep -n "deberta" localisation/pipeline.py
|
| 236 |
+
# Is it commented out or skipped when model can't load?
|
| 237 |
+
```
|
| 238 |
+
|
| 239 |
+
**What to add:** A fallback warning in the UI when DeBERTa is skipped.
|
| 240 |
+
|
| 241 |
+
**Bigger improvement β add ColBERT reranking:**
|
| 242 |
+
```python
|
| 243 |
+
# Replace DeBERTa with ColBERT-v2 (better for code)
|
| 244 |
+
# pip install ragatouille
|
| 245 |
+
from ragatouille import RAGPretrainedModel
|
| 246 |
+
colbert = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
|
| 247 |
+
```
|
| 248 |
+
|
| 249 |
+
---
|
| 250 |
+
|
| 251 |
+
### Priority 7 β Add GitHub Actions CI/CD
|
| 252 |
+
|
| 253 |
+
**Why it matters:** Shows engineering maturity. Create `.github/workflows/test.yml`:
|
| 254 |
+
|
| 255 |
+
```yaml
|
| 256 |
+
name: CI
|
| 257 |
+
on: [push, pull_request]
|
| 258 |
+
jobs:
|
| 259 |
+
test:
|
| 260 |
+
runs-on: ubuntu-latest
|
| 261 |
+
steps:
|
| 262 |
+
- uses: actions/checkout@v4
|
| 263 |
+
- uses: actions/setup-python@v5
|
| 264 |
+
with: { python-version: '3.11' }
|
| 265 |
+
- run: pip install -r requirements.txt
|
| 266 |
+
- run: pytest tests/ -q --tb=short
|
| 267 |
+
- run: python -m fine_tuning.train --dry-run
|
| 268 |
+
```
|
| 269 |
+
|
| 270 |
+
**Badge to add to README:**
|
| 271 |
+
```markdown
|
| 272 |
+

|
| 273 |
+
```
|
| 274 |
+
|
| 275 |
+
---
|
| 276 |
+
|
| 277 |
+
### Summary: Upgrade Roadmap
|
| 278 |
+
|
| 279 |
+
| Priority | Task | Time | Resume Impact | Current Grade β After |
|
| 280 |
+
|---|---|---|---|---|
|
| 281 |
+
| 1 | Run real benchmark (50 issues) | 2 hrs | βββββ | B+ β A- |
|
| 282 |
+
| 2 | Run ablation study | 4 hrs | ββββ | A- β A |
|
| 283 |
+
| 3 | Fine-tune custom model | 2β3 days | βββββ | A β A+ |
|
| 284 |
+
| 4 | Write technical report | 6 hrs | βββ | A β A+ |
|
| 285 |
+
| 5 | Add SWE-agent comparison | 1 hr | βββ | A- β A |
|
| 286 |
+
| 6 | Improve localisation | 1 day | ββ | Minor |
|
| 287 |
+
| 7 | Add GitHub Actions CI | 30 min | ββ | Minor |
|
| 288 |
+
|
| 289 |
+
> **Minimum to reach A grade:** Complete Priorities 1 + 2 + 5 (one weekend of work, all free).
|
| 290 |
+
> **To reach A+ (research-track roles):** Also complete Priorities 3 + 4.
|
| 291 |
+
|
| 292 |
+
---
|
| 293 |
+
|
| 294 |
+
### What Interviewers Will Ask β And Your New Answers
|
| 295 |
+
|
| 296 |
+
| Question | Before | After (with improvements) |
|
| 297 |
+
|---|---|---|
|
| 298 |
+
| "What's your resolve rate?" | "30β42% is the SOTA range" β | "I measured 34.2% on 50 issues" β
|
|
| 299 |
+
| "What did each component contribute?" | "PPR helps" β | "PPR adds +8% Recall@5, ablation table in README" β
|
|
| 300 |
+
| "Did you train a model?" | "I wrote training code" β | "Yes β DeepSeek-Coder-7B, published to HuggingFace" β
|
|
| 301 |
+
| "How does it compare to SWE-agent?" | Can't answer β | "We outperform by 21% due to better localisation" β
|
|
| 302 |
+
|
| 303 |
+
---
|
| 304 |
+
|
| 305 |
|
| 306 |
---
|
| 307 |
|