repomind / README.md
ZeroR3's picture
fix: footer links β€” drop broken lablab/@-encoded URL, add LinkedIn + X + emails, plain-text name
6f8abcc
metadata
title: REPOMIND
emoji: 🧠
colorFrom: indigo
colorTo: red
sdk: gradio
sdk_version: 6.14.0
python_version: '3.13'
app_file: app.py
pinned: false
license: mit
short_description: Repo-scale coding agent β€” 256K context on a single MI300X
tags:
  - amd-hackathon-2026
  - amd-developer-hackathon
  - agents
  - coding-agent
  - long-context
  - rocm
  - mi300x
  - qwen3-coder
  - vllm

REPOMIND

Open-source repo-scale coding agent for self-hosted use. Designed to ingest an entire git repo (256K tokens, FP8) and reason across it on a single AMD MI300X β€” what NVIDIA H100 80GB cannot accommodate by VRAM accounting (~143GB total > 80GB).

Built for the AMD Developer Hackathon 2026 Β· MIT License Β· GitHub source

Why MI300X?

  • Qwen3-Coder-Next-FP8 weights β‰ˆ 80 GB
  • 256K KV cache @ FP8 β‰ˆ 38 GB
  • activations β‰ˆ 25 GB β†’ ~143 GB total on a single GPU
  • NVIDIA H100 80GB cannot accommodate this configuration on a single card by VRAM accounting (~143 GB > 80 GB). AMD MI300X 192 GB has the headroom.

This is a memory-architecture story, not a CUDA-vs-ROCm one.

Stack

  • Model: Qwen/Qwen3-Coder-Next-FP8 β€” 80B params, 3B active (MoE)
  • Inference: vLLM ROCm 7 with qwen3_coder tool-call parser
  • Agent loop: SC-TIR style (PLAN β†’ CALL TOOL β†’ OBSERVE β†’ THINK β†’ ANSWER)
  • Tools: read_file Β· grep_codebase Β· execute_code (sandboxed) Β· run_tests Β· git_log

Status β€” verified on real MI300X (2026-05-05 / 2026-05-06)

Full stress test on a single AMD MI300X x1 (AMD Developer Cloud, $1.99/hr, vLLM 0.17.1 + ROCm 7.2 Quick Start image). 2 sessions, 124 min total, ~$4.12.

Memory budget β€” Qwen3-Coder-Next-FP8 + 256K context, FP8 KV cache:

  • βœ… Model weights in VRAM: 77.29 GiB
  • βœ… Available KV cache: 94.58 GiB (2,065,744 tokens)
  • βœ… VRAM peak: 176 GiB / 191.7 GiB (92% utilization)
  • βœ… --max-model-len 262144 started, Application startup complete
  • βœ… /v1/models returns max_model_len: 262144

Concurrency stress (24 cells, default Triton attention, all 144 outputs clean):

  • βœ… 31/31 success at 8K, 16K, 32K, AND 64K β€” every realistic-developer context
  • βœ… 25/31 at 128K, 6-8 at 256K within a 15-minute window (compute-bound, honest ceiling)
  • βœ… Aggregate throughput at N=31: 78.5 tok/s @ 8K Β· 31.4 @ 16K Β· 12.1 @ 32K Β· 3.6 @ 64K

Long-context coherence β€” needle-in-haystack at 200K:

  • βœ… 3/3 positions passed (early, middle, late) β€” model recovers embedded sentinel function and constant
  • βœ… This proves 256K window is usable, not just allocated

End-to-end repo ingestion β€” 9/9 questions answered correctly:

  • βœ… REPOMIND self (68K tokens, 68 files) β€” 3/3
  • βœ… pallets/flask (408K total β†’ fitted 180K) β€” 3/3
  • βœ… pytorch/vision (1.3M tokens, 581 files, 6,799 chunks β†’ fitted 180K) β€” 3/3 with correct file path citations

Tuning attempt β€” measured regression worth reporting:

  • ⚠️ Tried --attention-backend ROCM_AITER_FA (AMD's hand-tuned MI300X kernels)
  • Throughput 2-4Γ— higher under AITER, TTFT 2.8Γ— faster at 64K
  • BUT output degenerates to repeating-punctuation gibberish in 137/144 cells under FP8 KV cache
  • Default Triton stays the production-safe choice; filed for AMD upstream investigation

Cost β€” at AMD Cloud $1.99/hr:

  • βœ… ~$45.75 / 1M completion tokens (aggregate at 32K, N=31)
  • βœ… 14.5 active continuous queriers per MI300X, or 70–140 dev seats for typical bursty engineering teams
  • βœ… Owned MI300X ($18K) breaks even vs Cursor in 3–6 months at team-of-100 usage

This Space currently runs CPU-basic with the mock LLM backend because keeping a paid MI300X droplet up 24/7 for sporadic visitors is uneconomical. Final demo wires to a live MI300X endpoint during the judging window.

Full evidence pack (7 JSON results + 5 PNG plots + e2e prompts/answers + 2Γ— rocm-smi snapshots + run logs) is in the repo: github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test Extended PHASE 1+2 narrative (24-cell matrix + AITER A/B): extended/SUMMARY.md.

If the MI300X memory-architecture pitch resonates, a like on this Space helps us with the Hugging Face Special Prize judging πŸ€—

Author

Sardor Razikov β€” Independent ML Engineer Β· Tashkent πŸ‡ΊπŸ‡Ώ