repomind

Running

App Files Files Community

repomind / README.md

ZeroR3

fix: footer links — drop broken lablab/@-encoded URL, add LinkedIn + X + emails, plain-text name

6f8abcc 1 day ago

preview code

raw

history blame contribute delete

4.99 kB

metadata

title: REPOMIND
emoji: 🧠
colorFrom: indigo
colorTo: red
sdk: gradio
sdk_version: 6.14.0
python_version: '3.13'
app_file: app.py
pinned: false
license: mit
short_description: Repo-scale coding agent — 256K context on a single MI300X
tags:
  - amd-hackathon-2026
  - amd-developer-hackathon
  - agents
  - coding-agent
  - long-context
  - rocm
  - mi300x
  - qwen3-coder
  - vllm

REPOMIND

Open-source repo-scale coding agent for self-hosted use. Designed to ingest an entire git repo (256K tokens, FP8) and reason across it on a single AMD MI300X — what NVIDIA H100 80GB cannot accommodate by VRAM accounting (~143GB total > 80GB).

Built for the AMD Developer Hackathon 2026 · MIT License · GitHub source

Why MI300X?

Qwen3-Coder-Next-FP8 weights ≈ 80 GB
256K KV cache @ FP8 ≈ 38 GB
activations ≈ 25 GB → ~143 GB total on a single GPU
NVIDIA H100 80GB cannot accommodate this configuration on a single card by VRAM accounting (~143 GB > 80 GB). AMD MI300X 192 GB has the headroom.

This is a memory-architecture story, not a CUDA-vs-ROCm one.

Stack

Model: Qwen/Qwen3-Coder-Next-FP8 — 80B params, 3B active (MoE)
Inference: vLLM ROCm 7 with qwen3_coder tool-call parser
Agent loop: SC-TIR style (PLAN → CALL TOOL → OBSERVE → THINK → ANSWER)
Tools: read_file · grep_codebase · execute_code (sandboxed) · run_tests · git_log

Status — verified on real MI300X (2026-05-05 / 2026-05-06)

Full stress test on a single AMD MI300X x1 (AMD Developer Cloud, $1.99/hr, vLLM 0.17.1 + ROCm 7.2 Quick Start image). 2 sessions, 124 min total, ~$4.12.

Memory budget — Qwen3-Coder-Next-FP8 + 256K context, FP8 KV cache:

✅ Model weights in VRAM: 77.29 GiB
✅ Available KV cache: 94.58 GiB (2,065,744 tokens)
✅ VRAM peak: 176 GiB / 191.7 GiB (92% utilization)
✅ --max-model-len 262144 started, Application startup complete
✅ /v1/models returns max_model_len: 262144

Concurrency stress (24 cells, default Triton attention, all 144 outputs clean):

✅ 31/31 success at 8K, 16K, 32K, AND 64K — every realistic-developer context
✅ 25/31 at 128K, 6-8 at 256K within a 15-minute window (compute-bound, honest ceiling)
✅ Aggregate throughput at N=31: 78.5 tok/s @ 8K · 31.4 @ 16K · 12.1 @ 32K · 3.6 @ 64K

Long-context coherence — needle-in-haystack at 200K:

✅ 3/3 positions passed (early, middle, late) — model recovers embedded sentinel function and constant
✅ This proves 256K window is usable, not just allocated

End-to-end repo ingestion — 9/9 questions answered correctly:

✅ REPOMIND self (68K tokens, 68 files) — 3/3
✅ pallets/flask (408K total → fitted 180K) — 3/3
✅ pytorch/vision (1.3M tokens, 581 files, 6,799 chunks → fitted 180K) — 3/3 with correct file path citations

Tuning attempt — measured regression worth reporting:

⚠️ Tried --attention-backend ROCM_AITER_FA (AMD's hand-tuned MI300X kernels)
Throughput 2-4× higher under AITER, TTFT 2.8× faster at 64K
BUT output degenerates to repeating-punctuation gibberish in 137/144 cells under FP8 KV cache
Default Triton stays the production-safe choice; filed for AMD upstream investigation

Cost — at AMD Cloud $1.99/hr:

✅ ~$45.75 / 1M completion tokens (aggregate at 32K, N=31)
✅ 14.5 active continuous queriers per MI300X, or 70–140 dev seats for typical bursty engineering teams
✅ Owned MI300X ($18K) breaks even vs Cursor in 3–6 months at team-of-100 usage

This Space currently runs CPU-basic with the mock LLM backend because keeping a paid MI300X droplet up 24/7 for sporadic visitors is uneconomical. Final demo wires to a live MI300X endpoint during the judging window.

Full evidence pack (7 JSON results + 5 PNG plots + e2e prompts/answers + 2× rocm-smi snapshots + run logs) is in the repo: github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test Extended PHASE 1+2 narrative (24-cell matrix + AITER A/B): extended/SUMMARY.md.

If the MI300X memory-architecture pitch resonates, a like on this Space helps us with the Hugging Face Special Prize judging 🤗

Author

Sardor Razikov — Independent ML Engineer · Tashkent 🇺🇿

Kaggle SPR 2026 #7/371 (Top 1.9%) · S6E3 #23/4,142 · AIMO3 39/50 (XTX $2.2M)
Epistemic Curie Benchmark on Zenodo
GitHub · LinkedIn · X / Twitter
Email: razikovsardor1@gmail.com · razikovs777@gmail.com