SouravNath commited on
Commit
bd7df56
Β·
1 Parent(s): 8291e3d

docs: add complete improvement roadmap for top-tier AIML resume

Browse files
Files changed (1) hide show
  1. GUIDE.md +298 -10
GUIDE.md CHANGED
@@ -4,16 +4,304 @@
4
 
5
  ## Table of Contents
6
 
7
- 1. [Learning Roadmap](#learning-roadmap) β€” what to read, in what order
8
- 2. [How the System Works](#how-the-system-works) β€” full mental model
9
- 3. [Local Setup](#local-setup) β€” step-by-step from zero
10
- 4. [Getting Free API Keys](#getting-free-api-keys)
11
- 5. [Running the Project](#running-the-project)
12
- 6. [Running the Benchmark](#running-the-benchmark)
13
- 7. [Fine-Tuning on Free GPU](#fine-tuning-on-free-gpu)
14
- 8. [Deploying for Free](#deploying-for-free)
15
- 9. [Troubleshooting](#troubleshooting)
16
- 10. [Interview Prep](#interview-prep)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ---
19
 
 
4
 
5
  ## Table of Contents
6
 
7
+ 1. [**πŸš€ How to Improve This Project**](#how-to-improve-this-project) ← **Start here**
8
+ 2. [Learning Roadmap](#learning-roadmap) β€” what to read, in what order
9
+ 3. [How the System Works](#how-the-system-works) β€” full mental model
10
+ 4. [Local Setup](#local-setup) β€” step-by-step from zero
11
+ 5. [Getting Free API Keys](#getting-free-api-keys)
12
+ 6. [Running the Project](#running-the-project)
13
+ 7. [Running the Benchmark](#running-the-benchmark)
14
+ 8. [Fine-Tuning on Free GPU](#fine-tuning-on-free-gpu)
15
+ 9. [Deploying for Free](#deploying-for-free)
16
+ 10. [Troubleshooting](#troubleshooting)
17
+ 11. [Interview Prep](#interview-prep)
18
+
19
+ ---
20
+
21
+ ## How to Improve This Project
22
+
23
+ > Current grade: **B+** for top tech AIML roles.
24
+ > Target grade: **A / A+** β€” follow these steps in priority order.
25
+
26
+ ---
27
+
28
+ ### Priority 1 β€” Run the Real Benchmark ⭐ (Biggest Impact)
29
+
30
+ **Why it matters:** Right now, "30–42% resolve rate" is just the SWE-bench SOTA range β€” not a number you actually measured. Interviewers will ask *"what did YOU get?"* and you won't have an answer. Fix this first.
31
+
32
+ **What to do:**
33
+
34
+ ```bash
35
+ # Run on 50 issues first (~30 minutes, free with Groq)
36
+ python -m experiments.benchmark \
37
+ --variant with_reflection \
38
+ --max-instances 50 \
39
+ --output-dir results/benchmark_50/
40
+
41
+ # Then check your actual resolve rate
42
+ python -m experiments.benchmark --report-only --results-dir results/benchmark_50/
43
+ ```
44
+
45
+ **What to add to README after running:**
46
+ ```markdown
47
+ ## Benchmark Results (measured)
48
+
49
+ | Variant | Instances | Resolve Rate | Recall@5 | Avg Time |
50
+ |----------------------|-----------|--------------|----------|----------|
51
+ | No reflection (k=1) | 50 | XX.X% | XX.X% | XXs |
52
+ | With reflection (k=3)| 50 | XX.X% | XX.X% | XXs |
53
+ ```
54
+
55
+ **Resume bullet point upgrade:**
56
+ ```
57
+ Before: "30–42% resolve rate on SWE-bench Lite"
58
+ After: "Achieved 34.2% resolve rate on SWE-bench Lite (50 issues),
59
+ +9% over no-reflection baseline"
60
+ ```
61
+
62
+ **Time required:** 1–2 hours (mostly waiting for API calls)
63
+ **Cost:** Free (Groq rate limits allow ~100 issues/day)
64
+
65
+ ---
66
+
67
+ ### Priority 2 β€” Run Ablation Study ⭐⭐
68
+
69
+ **Why it matters:** An ablation study shows you think like a researcher, not just a developer. It proves each component you built actually contributes.
70
+
71
+ **What to do:** Run the benchmark 3 times with different configs:
72
+
73
+ ```bash
74
+ # Variant A: BM25 only (no embeddings, no PPR)
75
+ python -m experiments.benchmark --variant bm25_only --max-instances 50
76
+
77
+ # Variant B: BM25 + embeddings, no PPR
78
+ python -m experiments.benchmark --variant no_ppr --max-instances 50
79
+
80
+ # Variant C: Full pipeline (BM25 + embeddings + PPR + DeBERTa)
81
+ python -m experiments.benchmark --variant with_reflection --max-instances 50
82
+ ```
83
+
84
+ **Expected result table (fill in your real numbers):**
85
+
86
+ | Component | Recall@5 | Resolve Rate |
87
+ |------------------------------------|----------|--------------|
88
+ | BM25 only | ~41% | ~18% |
89
+ | BM25 + Embeddings | ~58% | ~24% |
90
+ | BM25 + Embeddings + PPR | ~72% | ~30% |
91
+ | + DeBERTa reranker + Reflection | ~74% | ~34% |
92
+
93
+ **This table = your most powerful interview answer.**
94
+
95
+ **Time required:** 3–4 hours
96
+ **Cost:** Free (Groq)
97
+
98
+ ---
99
+
100
+ ### Priority 3 β€” Fine-Tune a Custom Model ⭐⭐⭐
101
+
102
+ **Why it matters:** "I called the Groq API" β†’ "I trained my own model" is the biggest single upgrade. This is what separates ML engineers from developers who use LLMs.
103
+
104
+ **Step-by-step:**
105
+
106
+ **Step 3a: Collect trajectories (run the agent on 100+ issues)**
107
+ ```bash
108
+ python -m experiments.benchmark --max-instances 100 --output-dir results/
109
+ # Each run saves a trajectory to results/trajectories/*.jsonl
110
+ ```
111
+
112
+ **Step 3b: Build fine-tuning dataset from trajectories**
113
+ ```python
114
+ from fine_tuning.dataset_builder import FinetuningDatasetBuilder
115
+ builder = FinetuningDatasetBuilder()
116
+ stats = builder.build(format='chatml')
117
+ print(stats)
118
+ # Creates: results/fine_tuning/train.jsonl (~80%), val.jsonl (~20%)
119
+ ```
120
+
121
+ **Step 3c: Validate dataset (no GPU needed)**
122
+ ```bash
123
+ python -m fine_tuning.train --dry-run
124
+ ```
125
+
126
+ **Step 3d: Train on Kaggle (free T4 GPU β€” 12 hours/week)**
127
+ 1. Go to kaggle.com β†’ New Notebook β†’ Accelerator β†’ GPU T4 x2
128
+ 2. Run:
129
+ ```python
130
+ !pip install transformers peft trl bitsandbytes datasets -q
131
+ !git clone https://github.com/Sourav-Nath-01/repomind.git
132
+ %cd repomind
133
+ !python -m fine_tuning.train --model deepseek-ai/deepseek-coder-6.7b-instruct \
134
+ --epochs 3 --output /kaggle/working/checkpoints
135
+ ```
136
+ 3. Takes ~4–6 hours on free Kaggle T4
137
+
138
+ **Step 3e: Upload fine-tuned adapter to HuggingFace**
139
+ ```python
140
+ from huggingface_hub import HfApi
141
+ api = HfApi()
142
+ api.upload_folder(
143
+ folder_path="/kaggle/working/checkpoints/lora_adapter",
144
+ repo_id="SouravNath/repomind-coder-7b-lora",
145
+ repo_type="model"
146
+ )
147
+ ```
148
+
149
+ **Step 3f: Compare fine-tuned vs base model on benchmark**
150
+ ```bash
151
+ # Run benchmark with your fine-tuned model
152
+ LLM_MODEL=SouravNath/repomind-coder-7b-lora \
153
+ python -m experiments.benchmark --max-instances 50
154
+ ```
155
+
156
+ **Resume bullet point:**
157
+ ```
158
+ "Fine-tuned DeepSeek-Coder-7B with QLoRA (r=16) on 500+ agent trajectories,
159
+ improving resolve rate from 34% β†’ 41% over the base model"
160
+ ```
161
+
162
+ **Time required:** 2–3 days (data collection + training + evaluation)
163
+ **Cost:** Free (Kaggle GPU quota)
164
+
165
+ ---
166
+
167
+ ### Priority 4 β€” Write a Technical Report (2–3 pages)
168
+
169
+ **Why it matters:** It positions you as research-aware. Even without a paper, a well-written report shows scientific thinking. Put it in the repo as `REPORT.md` and link it from README.
170
+
171
+ **Sections to include:**
172
+
173
+ ```markdown
174
+ # RepoMind: Autonomous Code Repair with Graph-Guided Localisation
175
+
176
+ ## Abstract (100 words)
177
+ We present RepoMind, an autonomous code repair system that combines
178
+ BM25 retrieval, dense embeddings, and Personalised PageRank graph
179
+ propagation to localise bugs in real-world Python repositories, followed
180
+ by LLM-based patch generation with iterative reflection.
181
+
182
+ ## 1. Introduction
183
+ - Problem: Software bugs cost X hours/year
184
+ - SWE-bench Lite as evaluation benchmark
185
+ - Our contribution: PPR + RRF fusion localisation pipeline
186
+
187
+ ## 2. Method
188
+ - 2.1 AST Parsing + Dependency Graph
189
+ - 2.2 File Localisation: BM25, Embeddings, PPR, RRF Fusion
190
+ - 2.3 Patch Generation + Reflection Loop
191
+ - 2.4 QLoRA Fine-Tuning Pipeline
192
+
193
+ ## 3. Experiments
194
+ - 3.1 Ablation study results table
195
+ - 3.2 Comparison with SWE-agent baseline
196
+ - 3.3 Fine-tuned model results (if done)
197
+
198
+ ## 4. Limitations & Future Work
199
+ ## 5. References
200
+ ```
201
+
202
+ **Time required:** 4–6 hours
203
+ **Cost:** Free
204
+
205
+ ---
206
+
207
+ ### Priority 5 β€” Add a Comparison to SWE-agent Baseline
208
+
209
+ **Why it matters:** Shows scientific thinking β€” "my system vs the prior art."
210
+
211
+ ```bash
212
+ # SWE-agent uses GPT-4 + shell tools. Cite their paper's resolve rate:
213
+ # SWE-agent (Jimenez et al., 2024): 12.5% on SWE-bench Lite with GPT-4
214
+ # Our system: ~34% (because we have better localisation)
215
+ ```
216
+
217
+ **Add this table to README:**
218
+
219
+ | System | Model | Resolve Rate | Localisation |
220
+ |-----------------------------|---------------|--------------|--------------|
221
+ | SWE-agent (2024) | GPT-4 | 12.5% | Shell grep |
222
+ | Devin (2024) | Proprietary | 13.8% | β€” |
223
+ | **RepoMind (ours)** | Llama-3.3-70B | **XX.X%** | BM25+PPR+RRF |
224
+ | **RepoMind + fine-tuned** | Custom 7B | **XX.X%** | BM25+PPR+RRF |
225
+
226
+ ---
227
+
228
+ ### Priority 6 β€” Improve the Localisation Pipeline
229
+
230
+ **Current gap:** DeBERTa reranker in `localisation/deberta_ranker.py` may not be running in production (HF Spaces has limited RAM).
231
+
232
+ **What to check:**
233
+ ```bash
234
+ # Test if DeBERTa is actually being used
235
+ grep -n "deberta" localisation/pipeline.py
236
+ # Is it commented out or skipped when model can't load?
237
+ ```
238
+
239
+ **What to add:** A fallback warning in the UI when DeBERTa is skipped.
240
+
241
+ **Bigger improvement β€” add ColBERT reranking:**
242
+ ```python
243
+ # Replace DeBERTa with ColBERT-v2 (better for code)
244
+ # pip install ragatouille
245
+ from ragatouille import RAGPretrainedModel
246
+ colbert = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
247
+ ```
248
+
249
+ ---
250
+
251
+ ### Priority 7 β€” Add GitHub Actions CI/CD
252
+
253
+ **Why it matters:** Shows engineering maturity. Create `.github/workflows/test.yml`:
254
+
255
+ ```yaml
256
+ name: CI
257
+ on: [push, pull_request]
258
+ jobs:
259
+ test:
260
+ runs-on: ubuntu-latest
261
+ steps:
262
+ - uses: actions/checkout@v4
263
+ - uses: actions/setup-python@v5
264
+ with: { python-version: '3.11' }
265
+ - run: pip install -r requirements.txt
266
+ - run: pytest tests/ -q --tb=short
267
+ - run: python -m fine_tuning.train --dry-run
268
+ ```
269
+
270
+ **Badge to add to README:**
271
+ ```markdown
272
+ ![CI](https://github.com/Sourav-Nath-01/repomind/actions/workflows/test.yml/badge.svg)
273
+ ```
274
+
275
+ ---
276
+
277
+ ### Summary: Upgrade Roadmap
278
+
279
+ | Priority | Task | Time | Resume Impact | Current Grade β†’ After |
280
+ |---|---|---|---|---|
281
+ | 1 | Run real benchmark (50 issues) | 2 hrs | ⭐⭐⭐⭐⭐ | B+ β†’ A- |
282
+ | 2 | Run ablation study | 4 hrs | ⭐⭐⭐⭐ | A- β†’ A |
283
+ | 3 | Fine-tune custom model | 2–3 days | ⭐⭐⭐⭐⭐ | A β†’ A+ |
284
+ | 4 | Write technical report | 6 hrs | ⭐⭐⭐ | A β†’ A+ |
285
+ | 5 | Add SWE-agent comparison | 1 hr | ⭐⭐⭐ | A- β†’ A |
286
+ | 6 | Improve localisation | 1 day | ⭐⭐ | Minor |
287
+ | 7 | Add GitHub Actions CI | 30 min | ⭐⭐ | Minor |
288
+
289
+ > **Minimum to reach A grade:** Complete Priorities 1 + 2 + 5 (one weekend of work, all free).
290
+ > **To reach A+ (research-track roles):** Also complete Priorities 3 + 4.
291
+
292
+ ---
293
+
294
+ ### What Interviewers Will Ask β€” And Your New Answers
295
+
296
+ | Question | Before | After (with improvements) |
297
+ |---|---|---|
298
+ | "What's your resolve rate?" | "30–42% is the SOTA range" ❌ | "I measured 34.2% on 50 issues" βœ… |
299
+ | "What did each component contribute?" | "PPR helps" ❌ | "PPR adds +8% Recall@5, ablation table in README" βœ… |
300
+ | "Did you train a model?" | "I wrote training code" ❌ | "Yes β€” DeepSeek-Coder-7B, published to HuggingFace" βœ… |
301
+ | "How does it compare to SWE-agent?" | Can't answer ❌ | "We outperform by 21% due to better localisation" βœ… |
302
+
303
+ ---
304
+
305
 
306
  ---
307