| --- |
| title: REPOMIND |
| emoji: π§ |
| colorFrom: indigo |
| colorTo: red |
| sdk: gradio |
| sdk_version: 6.14.0 |
| python_version: '3.13' |
| app_file: app.py |
| pinned: false |
| license: mit |
| short_description: Repo-scale coding agent β 256K context on a single MI300X |
| tags: |
| - amd-hackathon-2026 |
| - amd-developer-hackathon |
| - agents |
| - coding-agent |
| - long-context |
| - rocm |
| - mi300x |
| - qwen3-coder |
| - vllm |
| --- |
| |
| # REPOMIND |
|
|
| > Open-source repo-scale coding agent for self-hosted use. Designed to ingest an entire git repo (256K tokens, FP8) and reason across it on a single AMD MI300X β what NVIDIA H100 80GB cannot accommodate by VRAM accounting (~143GB total > 80GB). |
|
|
| **Built for the [AMD Developer Hackathon 2026](https://lablab.ai/ai-hackathons/amd-developer)** Β· MIT License Β· [GitHub source](https://github.com/SRKRZ23/repomind) |
|
|
| ## Why MI300X? |
|
|
| - Qwen3-Coder-Next-FP8 weights β 80 GB |
| - 256K KV cache @ FP8 β 38 GB |
| - activations β 25 GB β **~143 GB total on a single GPU** |
| - NVIDIA H100 80GB cannot accommodate this configuration on a single card by VRAM accounting (~143 GB > 80 GB). AMD MI300X 192 GB has the headroom. |
|
|
| This is a memory-architecture story, not a CUDA-vs-ROCm one. |
|
|
| ## Stack |
|
|
| - **Model**: `Qwen/Qwen3-Coder-Next-FP8` β 80B params, 3B active (MoE) |
| - **Inference**: vLLM ROCm 7 with `qwen3_coder` tool-call parser |
| - **Agent loop**: SC-TIR style (PLAN β CALL TOOL β OBSERVE β THINK β ANSWER) |
| - **Tools**: `read_file` Β· `grep_codebase` Β· `execute_code` (sandboxed) Β· `run_tests` Β· `git_log` |
|
|
| ## Status β verified on real MI300X (2026-05-05 / 2026-05-06) |
|
|
| Full stress test on a single AMD MI300X x1 (AMD Developer Cloud, $1.99/hr, vLLM 0.17.1 + ROCm 7.2 Quick Start image). **2 sessions, 124 min total, ~$4.12.** |
|
|
| **Memory budget β Qwen3-Coder-Next-FP8 + 256K context, FP8 KV cache:** |
| - β
Model weights in VRAM: **77.29 GiB** |
| - β
Available KV cache: **94.58 GiB** (2,065,744 tokens) |
| - β
VRAM peak: **176 GiB / 191.7 GiB** (92% utilization) |
| - β
`--max-model-len 262144` started, `Application startup complete` |
| - β
`/v1/models` returns `max_model_len: 262144` |
|
|
| **Concurrency stress (24 cells, default Triton attention, all 144 outputs clean):** |
| - β
**31/31 success at 8K, 16K, 32K, AND 64K** β every realistic-developer context |
| - β
**25/31 at 128K**, **6-8 at 256K** within a 15-minute window (compute-bound, honest ceiling) |
| - β
Aggregate throughput at N=31: 78.5 tok/s @ 8K Β· 31.4 @ 16K Β· 12.1 @ 32K Β· 3.6 @ 64K |
|
|
| **Long-context coherence β needle-in-haystack at 200K:** |
| - β
**3/3 positions passed** (early, middle, late) β model recovers embedded sentinel function and constant |
| - β
This proves 256K window is *usable*, not just *allocated* |
|
|
| **End-to-end repo ingestion β 9/9 questions answered correctly:** |
| - β
REPOMIND self (68K tokens, 68 files) β 3/3 |
| - β
pallets/flask (408K total β fitted 180K) β 3/3 |
| - β
**pytorch/vision (1.3M tokens, 581 files, 6,799 chunks β fitted 180K) β 3/3** with correct file path citations |
|
|
| **Tuning attempt β measured regression worth reporting:** |
| - β οΈ Tried `--attention-backend ROCM_AITER_FA` (AMD's hand-tuned MI300X kernels) |
| - Throughput **2-4Γ higher** under AITER, TTFT 2.8Γ faster at 64K |
| - BUT output **degenerates to repeating-punctuation gibberish** in 137/144 cells under FP8 KV cache |
| - Default Triton stays the production-safe choice; filed for AMD upstream investigation |
|
|
| **Cost β at AMD Cloud $1.99/hr:** |
| - β
~$45.75 / 1M completion tokens (aggregate at 32K, N=31) |
| - β
14.5 active continuous queriers per MI300X, or 70β140 dev seats for typical bursty engineering teams |
| - β
Owned MI300X ($18K) breaks even vs Cursor in 3β6 months at team-of-100 usage |
|
|
| This Space currently runs CPU-basic with the **mock LLM backend** because keeping a paid MI300X droplet up 24/7 for sporadic visitors is uneconomical. **Final demo wires to a live MI300X endpoint** during the judging window. |
|
|
| Full evidence pack (7 JSON results + 5 PNG plots + e2e prompts/answers + 2Γ rocm-smi snapshots + run logs) is in the repo: |
| [github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test) |
| Extended PHASE 1+2 narrative (24-cell matrix + AITER A/B): [extended/SUMMARY.md](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test/extended). |
|
|
| If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** π€ |
|
|
| ## Author |
|
|
| **Sardor Razikov** β Independent ML Engineer Β· Tashkent πΊπΏ |
| - Kaggle SPR 2026 #7/371 (Top 1.9%) Β· S6E3 #23/4,142 Β· AIMO3 39/50 (XTX $2.2M) |
| - [Epistemic Curie Benchmark on Zenodo](https://doi.org/10.5281/zenodo.19791329) |
| - [GitHub](https://github.com/SRKRZ23/repomind) Β· [LinkedIn](https://www.linkedin.com/in/sardor-razikov-569a5327b) Β· [X / Twitter](https://x.com/SardorRazi99093) |
| - Email: razikovsardor1@gmail.com Β· razikovs777@gmail.com |
|
|