title: REPOMIND
emoji: π§
colorFrom: indigo
colorTo: red
sdk: gradio
sdk_version: 6.14.0
python_version: '3.13'
app_file: app.py
pinned: false
license: mit
short_description: Repo-scale coding agent β 256K context on a single MI300X
tags:
- amd-hackathon-2026
- amd-developer-hackathon
- agents
- coding-agent
- long-context
- rocm
- mi300x
- qwen3-coder
- vllm
REPOMIND
Open-source repo-scale coding agent for self-hosted use. Designed to ingest an entire git repo (256K tokens, FP8) and reason across it on a single AMD MI300X β what NVIDIA H100 80GB cannot accommodate by VRAM accounting (~143GB total > 80GB).
Built for the AMD Developer Hackathon 2026 Β· MIT License Β· GitHub source
Why MI300X?
- Qwen3-Coder-Next-FP8 weights β 80 GB
- 256K KV cache @ FP8 β 38 GB
- activations β 25 GB β ~143 GB total on a single GPU
- NVIDIA H100 80GB cannot accommodate this configuration on a single card by VRAM accounting (~143 GB > 80 GB). AMD MI300X 192 GB has the headroom.
This is a memory-architecture story, not a CUDA-vs-ROCm one.
Stack
- Model:
Qwen/Qwen3-Coder-Next-FP8β 80B params, 3B active (MoE) - Inference: vLLM ROCm 7 with
qwen3_codertool-call parser - Agent loop: SC-TIR style (PLAN β CALL TOOL β OBSERVE β THINK β ANSWER)
- Tools:
read_fileΒ·grep_codebaseΒ·execute_code(sandboxed) Β·run_testsΒ·git_log
Status β verified on real MI300X (2026-05-05 / 2026-05-06)
Full stress test on a single AMD MI300X x1 (AMD Developer Cloud, $1.99/hr, vLLM 0.17.1 + ROCm 7.2 Quick Start image). 2 sessions, 124 min total, ~$4.12.
Memory budget β Qwen3-Coder-Next-FP8 + 256K context, FP8 KV cache:
- β Model weights in VRAM: 77.29 GiB
- β Available KV cache: 94.58 GiB (2,065,744 tokens)
- β VRAM peak: 176 GiB / 191.7 GiB (92% utilization)
- β
--max-model-len 262144started,Application startup complete - β
/v1/modelsreturnsmax_model_len: 262144
Concurrency stress (24 cells, default Triton attention, all 144 outputs clean):
- β 31/31 success at 8K, 16K, 32K, AND 64K β every realistic-developer context
- β 25/31 at 128K, 6-8 at 256K within a 15-minute window (compute-bound, honest ceiling)
- β Aggregate throughput at N=31: 78.5 tok/s @ 8K Β· 31.4 @ 16K Β· 12.1 @ 32K Β· 3.6 @ 64K
Long-context coherence β needle-in-haystack at 200K:
- β 3/3 positions passed (early, middle, late) β model recovers embedded sentinel function and constant
- β This proves 256K window is usable, not just allocated
End-to-end repo ingestion β 9/9 questions answered correctly:
- β REPOMIND self (68K tokens, 68 files) β 3/3
- β pallets/flask (408K total β fitted 180K) β 3/3
- β pytorch/vision (1.3M tokens, 581 files, 6,799 chunks β fitted 180K) β 3/3 with correct file path citations
Tuning attempt β measured regression worth reporting:
- β οΈ Tried
--attention-backend ROCM_AITER_FA(AMD's hand-tuned MI300X kernels) - Throughput 2-4Γ higher under AITER, TTFT 2.8Γ faster at 64K
- BUT output degenerates to repeating-punctuation gibberish in 137/144 cells under FP8 KV cache
- Default Triton stays the production-safe choice; filed for AMD upstream investigation
Cost β at AMD Cloud $1.99/hr:
- β ~$45.75 / 1M completion tokens (aggregate at 32K, N=31)
- β 14.5 active continuous queriers per MI300X, or 70β140 dev seats for typical bursty engineering teams
- β Owned MI300X ($18K) breaks even vs Cursor in 3β6 months at team-of-100 usage
This Space currently runs CPU-basic with the mock LLM backend because keeping a paid MI300X droplet up 24/7 for sporadic visitors is uneconomical. Final demo wires to a live MI300X endpoint during the judging window.
Full evidence pack (7 JSON results + 5 PNG plots + e2e prompts/answers + 2Γ rocm-smi snapshots + run logs) is in the repo: github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test Extended PHASE 1+2 narrative (24-cell matrix + AITER A/B): extended/SUMMARY.md.
If the MI300X memory-architecture pitch resonates, a like on this Space helps us with the Hugging Face Special Prize judging π€
Author
Sardor Razikov β Independent ML Engineer Β· Tashkent πΊπΏ
- Kaggle SPR 2026 #7/371 (Top 1.9%) Β· S6E3 #23/4,142 Β· AIMO3 39/50 (XTX $2.2M)
- Epistemic Curie Benchmark on Zenodo
- GitHub Β· LinkedIn Β· X / Twitter
- Email: razikovsardor1@gmail.com Β· razikovs777@gmail.com