feat: HF Space updated for final submission
Browse files- README.md: full stress test results (62 verified data points across 124 min)
- README.md: 6.49x answer to Hakob, AITER regression honesty, 9/9 repo Q&A
- app.py: env-var backend toggle (VLLM_BASE_URL + MODEL_NAME), Steve Kimoi tutorial pattern
- app.py: live MI300X mode when VLLM_BASE_URL is set, mock fallback otherwise
README.md
CHANGED
|
@@ -44,19 +44,47 @@ This is a memory-architecture story, not a CUDA-vs-ROCm one.
|
|
| 44 |
- **Agent loop**: SC-TIR style (PLAN β CALL TOOL β OBSERVE β THINK β ANSWER)
|
| 45 |
- **Tools**: `read_file` Β· `grep_codebase` Β· `execute_code` (sandboxed) Β· `run_tests` Β· `git_log`
|
| 46 |
|
| 47 |
-
## Status β verified on real MI300X (2026-05-05)
|
| 48 |
|
| 49 |
-
|
| 50 |
|
|
|
|
| 51 |
- β
Model weights in VRAM: **77.29 GiB**
|
| 52 |
-
- β
Available KV cache: **
|
| 53 |
-
- β
|
|
|
|
| 54 |
- β
`/v1/models` returns `max_model_len: 262144`
|
| 55 |
-
- β
**31.31Γ max concurrency at 256K context** β single MI300X serves ~31 simultaneous users at full 256K context
|
| 56 |
-
- β
Real Python code generation through `/v1/chat/completions` (merge sort / LCS / hello world)
|
| 57 |
-
- β
Cost of smoke test: ~$1.00 of $100 credits
|
| 58 |
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** π€
|
| 62 |
|
|
|
|
| 44 |
- **Agent loop**: SC-TIR style (PLAN β CALL TOOL β OBSERVE β THINK β ANSWER)
|
| 45 |
- **Tools**: `read_file` Β· `grep_codebase` Β· `execute_code` (sandboxed) Β· `run_tests` Β· `git_log`
|
| 46 |
|
| 47 |
+
## Status β verified on real MI300X (2026-05-05 / 2026-05-06)
|
| 48 |
|
| 49 |
+
Full stress test on a single AMD MI300X x1 (AMD Developer Cloud, $1.99/hr, vLLM 0.17.1 + ROCm 7.2 Quick Start image). **2 sessions, 124 min total, ~$4.12.**
|
| 50 |
|
| 51 |
+
**Memory budget β Qwen3-Coder-Next-FP8 + 256K context, FP8 KV cache:**
|
| 52 |
- β
Model weights in VRAM: **77.29 GiB**
|
| 53 |
+
- β
Available KV cache: **94.58 GiB** (2,065,744 tokens)
|
| 54 |
+
- β
VRAM peak: **176 GiB / 191.7 GiB** (92% utilization)
|
| 55 |
+
- β
`--max-model-len 262144` started, `Application startup complete`
|
| 56 |
- β
`/v1/models` returns `max_model_len: 262144`
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
+
**Concurrency stress (24 cells, default Triton attention, all 144 outputs clean):**
|
| 59 |
+
- β
**31/31 success at 8K, 16K, 32K, AND 64K** β every realistic-developer context
|
| 60 |
+
- β
**25/31 at 128K**, **6-8 at 256K** within a 15-minute window (compute-bound, honest ceiling)
|
| 61 |
+
- β
Aggregate throughput at N=31: 78.5 tok/s @ 8K Β· 31.4 @ 16K Β· 12.1 @ 32K Β· 3.6 @ 64K
|
| 62 |
+
|
| 63 |
+
**Long-context coherence β needle-in-haystack at 200K:**
|
| 64 |
+
- β
**3/3 positions passed** (early, middle, late) β model recovers embedded sentinel function and constant
|
| 65 |
+
- β
This proves 256K window is *usable*, not just *allocated*
|
| 66 |
+
|
| 67 |
+
**End-to-end repo ingestion β 9/9 questions answered correctly:**
|
| 68 |
+
- β
REPOMIND self (68K tokens, 68 files) β 3/3
|
| 69 |
+
- β
pallets/flask (408K total β fitted 180K) β 3/3
|
| 70 |
+
- β
**pytorch/vision (1.3M tokens, 581 files, 6,799 chunks β fitted 180K) β 3/3** with correct file path citations
|
| 71 |
+
|
| 72 |
+
**Tuning attempt β measured regression worth reporting:**
|
| 73 |
+
- β οΈ Tried `--attention-backend ROCM_AITER_FA` (AMD's hand-tuned MI300X kernels)
|
| 74 |
+
- Throughput **2-4Γ higher** under AITER, TTFT 2.8Γ faster at 64K
|
| 75 |
+
- BUT output **degenerates to repeating-punctuation gibberish** in 137/144 cells under FP8 KV cache
|
| 76 |
+
- Default Triton stays the production-safe choice; filed for AMD upstream investigation
|
| 77 |
+
|
| 78 |
+
**Cost β at AMD Cloud $1.99/hr:**
|
| 79 |
+
- β
~$45.75 / 1M completion tokens (aggregate at 32K, N=31)
|
| 80 |
+
- β
14.5 active continuous queriers per MI300X, or 70β140 dev seats for typical bursty engineering teams
|
| 81 |
+
- β
Owned MI300X ($18K) breaks even vs Cursor in 3β6 months at team-of-100 usage
|
| 82 |
+
|
| 83 |
+
This Space currently runs CPU-basic with the **mock LLM backend** because keeping a paid MI300X droplet up 24/7 for sporadic visitors is uneconomical. **Final demo wires to a live MI300X endpoint** during the judging window.
|
| 84 |
+
|
| 85 |
+
Full evidence pack (7 JSON results + 5 PNG plots + e2e prompts/answers + 2Γ rocm-smi snapshots + run logs) is in the repo:
|
| 86 |
+
[github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test)
|
| 87 |
+
Extended PHASE 1+2 narrative (24-cell matrix + AITER A/B): [extended/SUMMARY.md](https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test/extended).
|
| 88 |
|
| 89 |
If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** π€
|
| 90 |
|
app.py
CHANGED
|
@@ -1,8 +1,17 @@
|
|
| 1 |
"""REPOMIND β HuggingFace Space entry point.
|
| 2 |
|
| 3 |
-
Public demo.
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
Local repo: https://github.com/SRKRZ23/repomind
|
| 8 |
Hackathon: https://lablab.ai/ai-hackathons/amd-developer
|
|
@@ -23,25 +32,42 @@ from ingestion.chunker import ingest_to_json
|
|
| 23 |
from ingestion.cloner import clone
|
| 24 |
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
# REPOMIND
|
| 28 |
**Open-source repo-scale coding agent on AMD MI300X.**
|
| 29 |
|
| 30 |
-
Ingest a git repository (up to 256K tokens, FP8) on a single GPU and
|
|
|
|
| 31 |
|
| 32 |
-
> π¦ GitHub: [SRKRZ23/repomind](https://github.com/SRKRZ23/repomind)
|
| 33 |
> π Built for the [AMD Developer Hackathon 2026](https://lablab.ai/ai-hackathons/amd-developer)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
-
###
|
| 36 |
-
- Qwen3-Coder-Next-FP8 weights β 80 GB
|
| 37 |
-
- 256K KV cache @ FP8 β 38 GB
|
| 38 |
-
- + activations β 25 GB β **~143 GB total on a single GPU**
|
| 39 |
-
- NVIDIA H100 80GB physically OOMs. AMD MI300X 192GB just runs it.
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
base URL once the MI300X endpoint is live.
|
| 45 |
"""
|
| 46 |
|
| 47 |
|
|
@@ -75,7 +101,16 @@ def ingest(url_or_path: str, chunk_tokens: int) -> str:
|
|
| 75 |
return f"β {type(e).__name__}: {e}"
|
| 76 |
|
| 77 |
|
| 78 |
-
def
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
summary_path = SCRATCH_DIR / "active.json"
|
| 80 |
if not summary_path.exists():
|
| 81 |
return "Ingest a repo first.", ""
|
|
@@ -85,91 +120,121 @@ def ask(question: str, backend: str, base_url: str, model: str):
|
|
| 85 |
summary = json.loads(summary_path.read_text())
|
| 86 |
repo_root = Path(summary.get("root", "."))
|
| 87 |
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
try:
|
| 93 |
-
from serving.vllm_client import VLLMClient
|
| 94 |
-
llm = VLLMClient(base_url=base_url.strip(), model=model.strip() or "Qwen/Qwen3-Coder-Next-FP8")
|
| 95 |
-
except Exception as e:
|
| 96 |
-
return f"β failed to init vLLM client: {e}", ""
|
| 97 |
-
else:
|
| 98 |
-
from serving.mock_client import MockClient
|
| 99 |
-
llm = MockClient(max_tool_turns=2)
|
| 100 |
|
| 101 |
from agent.loop import Agent
|
| 102 |
from tools.registry import default_registry
|
| 103 |
|
| 104 |
try:
|
| 105 |
-
agent = Agent(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
result = agent.run(question, summary)
|
| 107 |
except Exception as e:
|
| 108 |
return f"β agent failed: {type(e).__name__}: {e}", ""
|
| 109 |
|
| 110 |
-
trace_lines = [
|
|
|
|
|
|
|
|
|
|
| 111 |
trace = "\n".join(trace_lines) or "(no tool calls)"
|
| 112 |
return result.answer, trace
|
| 113 |
|
| 114 |
|
| 115 |
-
with gr.Blocks(
|
|
|
|
|
|
|
| 116 |
gr.Markdown(HEADER_MD)
|
| 117 |
|
| 118 |
with gr.Tab("1. Ingest"):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
with gr.Row():
|
| 120 |
url = gr.Textbox(
|
| 121 |
label="GitHub URL or owner/repo",
|
| 122 |
-
placeholder="https://github.com/
|
| 123 |
scale=4,
|
| 124 |
)
|
| 125 |
-
chunk_tokens = gr.Slider(
|
|
|
|
|
|
|
| 126 |
ingest_btn = gr.Button("Ingest", variant="primary")
|
| 127 |
ingest_out = gr.Code(label="Ingestion summary", language="json")
|
| 128 |
ingest_btn.click(ingest, [url, chunk_tokens], ingest_out)
|
| 129 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 130 |
with gr.Tab("2. Ask"):
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
base_url = gr.Textbox(
|
| 139 |
-
label="vLLM base URL (only used in `vllm` mode)",
|
| 140 |
-
value="",
|
| 141 |
-
placeholder="http://your-mi300x-host:8000/v1",
|
| 142 |
-
scale=2,
|
| 143 |
-
)
|
| 144 |
-
model = gr.Textbox(
|
| 145 |
-
label="Model id",
|
| 146 |
-
value="Qwen/Qwen3-Coder-Next-FP8",
|
| 147 |
-
scale=2,
|
| 148 |
-
)
|
| 149 |
question = gr.Textbox(
|
| 150 |
label="Question",
|
| 151 |
lines=3,
|
| 152 |
-
placeholder=
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
)
|
| 154 |
ask_btn = gr.Button("Ask", variant="primary")
|
| 155 |
answer = gr.Markdown(label="Answer")
|
| 156 |
-
tool_trace = gr.Code(label="Tool trace", language="markdown")
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
gr.Markdown(
|
| 166 |
"---\n"
|
| 167 |
"**Author:** [Sardor Razikov](https://huggingface.co/ZeroR3) Β· "
|
| 168 |
"[GitHub](https://github.com/SRKRZ23) Β· "
|
| 169 |
"[lablab.ai](https://lablab.ai/u/@Sardor_R) Β· "
|
| 170 |
-
"[Zenodo (ECB)](https://doi.org/10.5281/zenodo.19791329)"
|
|
|
|
|
|
|
|
|
|
| 171 |
)
|
| 172 |
|
| 173 |
|
| 174 |
if __name__ == "__main__":
|
| 175 |
-
demo.launch()
|
|
|
|
| 1 |
"""REPOMIND β HuggingFace Space entry point.
|
| 2 |
|
| 3 |
+
Public demo. Auto-detects backend from environment variables (Steve Kimoi's
|
| 4 |
+
canonical lablab/AMD tutorial pattern):
|
| 5 |
+
|
| 6 |
+
VLLM_BASE_URL β set in Space β Settings β Variables and secrets
|
| 7 |
+
to point at a live MI300X vLLM endpoint, e.g.
|
| 8 |
+
http://<your-droplet-ip>:8000/v1
|
| 9 |
+
MODEL_NAME β model id served by vLLM, defaults to
|
| 10 |
+
Qwen/Qwen3-Coder-Next-FP8
|
| 11 |
+
|
| 12 |
+
When VLLM_BASE_URL is unset (default), the Space runs the offline mock
|
| 13 |
+
backend on CPU-basic so it stays free 24/7. When set, the Space wires
|
| 14 |
+
through to the live AMD MI300X for real inference.
|
| 15 |
|
| 16 |
Local repo: https://github.com/SRKRZ23/repomind
|
| 17 |
Hackathon: https://lablab.ai/ai-hackathons/amd-developer
|
|
|
|
| 32 |
from ingestion.cloner import clone
|
| 33 |
|
| 34 |
|
| 35 |
+
# βββ Configuration via env vars (Steve Kimoi tutorial pattern) ββββββββββββ
|
| 36 |
+
VLLM_BASE_URL = os.environ.get("VLLM_BASE_URL", "").strip()
|
| 37 |
+
MODEL_NAME = os.environ.get("MODEL_NAME", "Qwen/Qwen3-Coder-Next-FP8").strip()
|
| 38 |
+
LIVE_BACKEND = bool(VLLM_BASE_URL)
|
| 39 |
+
BACKEND_LABEL = "π’ Live AMD MI300X" if LIVE_BACKEND else "π‘ Mock backend (CPU-basic, demo mode)"
|
| 40 |
+
BACKEND_HINT = (
|
| 41 |
+
f"Connected to vLLM endpoint: `{VLLM_BASE_URL}` Β· model `{MODEL_NAME}`"
|
| 42 |
+
if LIVE_BACKEND else
|
| 43 |
+
"Set the Space secrets `VLLM_BASE_URL` + `MODEL_NAME` to wire a real MI300X backend."
|
| 44 |
+
)
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
HEADER_MD = f"""
|
| 48 |
# REPOMIND
|
| 49 |
**Open-source repo-scale coding agent on AMD MI300X.**
|
| 50 |
|
| 51 |
+
Ingest a git repository (up to 256K tokens, FP8) on a single GPU and
|
| 52 |
+
reason across the whole codebase with multi-step tool use.
|
| 53 |
|
| 54 |
+
> π¦ GitHub: [SRKRZ23/repomind](https://github.com/SRKRZ23/repomind) Β· MIT
|
| 55 |
> π Built for the [AMD Developer Hackathon 2026](https://lablab.ai/ai-hackathons/amd-developer)
|
| 56 |
+
> π€ HF Special Prize candidate Β· π‘ Conservative claim discipline applied
|
| 57 |
+
|
| 58 |
+
### Why AMD MI300X (verified 2026-05-05 on real hardware)
|
| 59 |
+
|
| 60 |
+
- Qwen3-Coder-Next-FP8 weights = **77.29 GiB** in VRAM (verified)
|
| 61 |
+
- 256K KV cache @ FP8 = **94.58 GiB** available (2,065,744 tokens, verified)
|
| 62 |
+
- Activations + framework overhead β peak 176/191.7 GiB β **92% utilization**
|
| 63 |
+
- NVIDIA H100 80 GB cannot accommodate this on a single card by VRAM
|
| 64 |
+
accounting (~143 GB > 80 GB); MI300X 192 GB has the headroom
|
| 65 |
|
| 66 |
+
### Status
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
+
**Backend right now**: {BACKEND_LABEL}
|
| 69 |
+
|
| 70 |
+
{BACKEND_HINT}
|
|
|
|
| 71 |
"""
|
| 72 |
|
| 73 |
|
|
|
|
| 101 |
return f"β {type(e).__name__}: {e}"
|
| 102 |
|
| 103 |
|
| 104 |
+
def _build_llm():
|
| 105 |
+
"""Return an LLM client based on env-var configuration."""
|
| 106 |
+
if LIVE_BACKEND:
|
| 107 |
+
from serving.vllm_client import VLLMClient
|
| 108 |
+
return VLLMClient(base_url=VLLM_BASE_URL, model=MODEL_NAME)
|
| 109 |
+
from serving.mock_client import MockClient
|
| 110 |
+
return MockClient(max_tool_turns=2)
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
def ask(question: str):
|
| 114 |
summary_path = SCRATCH_DIR / "active.json"
|
| 115 |
if not summary_path.exists():
|
| 116 |
return "Ingest a repo first.", ""
|
|
|
|
| 120 |
summary = json.loads(summary_path.read_text())
|
| 121 |
repo_root = Path(summary.get("root", "."))
|
| 122 |
|
| 123 |
+
try:
|
| 124 |
+
llm = _build_llm()
|
| 125 |
+
except Exception as e:
|
| 126 |
+
return f"β failed to init LLM client: {type(e).__name__}: {e}", ""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
from agent.loop import Agent
|
| 129 |
from tools.registry import default_registry
|
| 130 |
|
| 131 |
try:
|
| 132 |
+
agent = Agent(
|
| 133 |
+
llm=llm,
|
| 134 |
+
tools=default_registry(repo_root, scratch_dir=SCRATCH_DIR / "scratch"),
|
| 135 |
+
max_steps=4,
|
| 136 |
+
)
|
| 137 |
result = agent.run(question, summary)
|
| 138 |
except Exception as e:
|
| 139 |
return f"β agent failed: {type(e).__name__}: {e}", ""
|
| 140 |
|
| 141 |
+
trace_lines = [
|
| 142 |
+
f"- {tc['name']} {json.dumps(tc['arguments'], ensure_ascii=False)}"
|
| 143 |
+
for tc in result.tool_calls
|
| 144 |
+
]
|
| 145 |
trace = "\n".join(trace_lines) or "(no tool calls)"
|
| 146 |
return result.answer, trace
|
| 147 |
|
| 148 |
|
| 149 |
+
with gr.Blocks(
|
| 150 |
+
title="REPOMIND β repo-scale coding agent on AMD MI300X",
|
| 151 |
+
) as demo:
|
| 152 |
gr.Markdown(HEADER_MD)
|
| 153 |
|
| 154 |
with gr.Tab("1. Ingest"):
|
| 155 |
+
gr.Markdown(
|
| 156 |
+
"Paste any **GitHub URL** or `owner/repo` shorthand. "
|
| 157 |
+
"REPOMIND clones it, parses the source files, and chunks them "
|
| 158 |
+
"into priority-ranked sections (README first, then top-level "
|
| 159 |
+
"symbols, then nested code, then tests)."
|
| 160 |
+
)
|
| 161 |
with gr.Row():
|
| 162 |
url = gr.Textbox(
|
| 163 |
label="GitHub URL or owner/repo",
|
| 164 |
+
placeholder="https://github.com/pallets/flask OR pallets/flask",
|
| 165 |
scale=4,
|
| 166 |
)
|
| 167 |
+
chunk_tokens = gr.Slider(
|
| 168 |
+
256, 4096, value=1024, step=128, label="Tokens / chunk", scale=1
|
| 169 |
+
)
|
| 170 |
ingest_btn = gr.Button("Ingest", variant="primary")
|
| 171 |
ingest_out = gr.Code(label="Ingestion summary", language="json")
|
| 172 |
ingest_btn.click(ingest, [url, chunk_tokens], ingest_out)
|
| 173 |
|
| 174 |
+
gr.Markdown(
|
| 175 |
+
"**Examples that work on a single MI300X**: "
|
| 176 |
+
"`pallets/flask` (~408K tokens, fits in 256K window with priority chunking) Β· "
|
| 177 |
+
"`pytorch/vision` (~1.3M tokens, trimmed to 180K of highest-priority "
|
| 178 |
+
"content via the chunker) Β· this repo `SRKRZ23/repomind` (~68K tokens, fits whole)."
|
| 179 |
+
)
|
| 180 |
+
|
| 181 |
with gr.Tab("2. Ask"):
|
| 182 |
+
gr.Markdown(
|
| 183 |
+
f"Ask any question about the ingested repo. The agent runs an "
|
| 184 |
+
f"SC-TIR loop (PLAN β CALL TOOL β OBSERVE β THINK β ANSWER) with "
|
| 185 |
+
f"five tools: `read_file`, `grep_codebase`, `execute_code` "
|
| 186 |
+
f"(sandboxed), `run_tests`, `git_log`.\n\n"
|
| 187 |
+
f"**Backend**: {BACKEND_LABEL}"
|
| 188 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 189 |
question = gr.Textbox(
|
| 190 |
label="Question",
|
| 191 |
lines=3,
|
| 192 |
+
placeholder=(
|
| 193 |
+
"Where is the WSGI entry point? Β· "
|
| 194 |
+
"What does the chunker prioritize? Β· "
|
| 195 |
+
"Trace one slab allocation through the call graph."
|
| 196 |
+
),
|
| 197 |
)
|
| 198 |
ask_btn = gr.Button("Ask", variant="primary")
|
| 199 |
answer = gr.Markdown(label="Answer")
|
| 200 |
+
tool_trace = gr.Code(label="Tool trace (agent steps)", language="markdown")
|
| 201 |
+
|
| 202 |
+
ask_btn.click(ask, [question], [answer, tool_trace])
|
| 203 |
+
|
| 204 |
+
with gr.Tab("3. Verified evidence"):
|
| 205 |
+
gr.Markdown(
|
| 206 |
+
"REPOMIND was stress-tested on a real AMD MI300X x1 droplet across "
|
| 207 |
+
"two sessions (**2026-05-05 / 2026-05-06**, 124 min total, $4.12). "
|
| 208 |
+
"Highlights:\n\n"
|
| 209 |
+
"| Test | Result |\n"
|
| 210 |
+
"|---|---|\n"
|
| 211 |
+
"| Memory peak | 176/191.7 GiB (92%) |\n"
|
| 212 |
+
"| `--max-model-len 262144` | started clean |\n"
|
| 213 |
+
"| Concurrency 8K / 16K / 32K / 64K @ N=31 | **31/31 success at every context** β
|\n"
|
| 214 |
+
"| Concurrency 128K @ N=31 | 25/31 (6 timeouts past 15 min) |\n"
|
| 215 |
+
"| Long-context needle at 200K | **3/3** pass (early/middle/late) |\n"
|
| 216 |
+
"| End-to-end repo Q&A | **9/9** correct across 3 repos |\n"
|
| 217 |
+
"| Largest repo tested | **pytorch/vision (1.3M tokens)** |\n"
|
| 218 |
+
"| Tuning attempt: AITER backend | regression β 137/144 cells broken under FP8 KV cache; default Triton stays production-safe |\n"
|
| 219 |
+
"| Cost | $1.99/hr cloud, $45.75/1M completion tokens |\n\n"
|
| 220 |
+
"Full evidence pack β JSON results, plots, raw model outputs β "
|
| 221 |
+
"is at [github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test]"
|
| 222 |
+
"(https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test). "
|
| 223 |
+
"Extended PHASE 1+2 narrative + AITER A/B in the [extended/SUMMARY.md]"
|
| 224 |
+
"(https://github.com/SRKRZ23/repomind/tree/main/benchmarks/2026-05-05-mi300x-stress-test/extended)."
|
| 225 |
+
)
|
| 226 |
|
| 227 |
gr.Markdown(
|
| 228 |
"---\n"
|
| 229 |
"**Author:** [Sardor Razikov](https://huggingface.co/ZeroR3) Β· "
|
| 230 |
"[GitHub](https://github.com/SRKRZ23) Β· "
|
| 231 |
"[lablab.ai](https://lablab.ai/u/@Sardor_R) Β· "
|
| 232 |
+
"[Zenodo (ECB)](https://doi.org/10.5281/zenodo.19791329) Β· "
|
| 233 |
+
"Tashkent πΊπΏ\n\n"
|
| 234 |
+
"*If the MI300X memory-architecture story resonates, "
|
| 235 |
+
"**a like on this Space helps with the Hugging Face Special Prize judging.** π€*"
|
| 236 |
)
|
| 237 |
|
| 238 |
|
| 239 |
if __name__ == "__main__":
|
| 240 |
+
demo.launch(theme=gr.themes.Soft(primary_hue="red", secondary_hue="gray"))
|