repomind

Running

ZeroR3 commited on 3 days ago

Commit

5e74789

1 Parent(s): 337508d

docs: status section -> verified on real MI300X (256K, 31.31x concurrency, 95.26 GiB KV cache, 77.29 GiB weights)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -43,9 +43,19 @@ This is a memory-architecture story, not a CUDA-vs-ROCm one.
 - **Agent loop**: SC-TIR style (PLAN → CALL TOOL → OBSERVE → THINK → ANSWER)
 - **Tools**: `read_file` · `grep_codebase` · `execute_code` (sandboxed) · `run_tests` · `git_log`
-## Status
-This Space runs on CPU-basic with the **mock LLM backend** for testing the agent loop without GPU credits. The `vllm` backend wires up automatically once the AMD MI300X endpoint comes online (AMD Cloud credits incoming).
 If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** 🤗

 - **Agent loop**: SC-TIR style (PLAN → CALL TOOL → OBSERVE → THINK → ANSWER)
 - **Tools**: `read_file` · `grep_codebase` · `execute_code` (sandboxed) · `run_tests` · `git_log`
+## Status — verified on real MI300X (2026-05-05)
+Smoke test on a single AMD MI300X x1 (AMD Developer Cloud, $1.99/hr, vLLM 0.17.1 + ROCm 7.2 Quick Start image):
+- ✅ Model weights in VRAM: **77.29 GiB**
+- ✅ Available KV cache: **95.26 GiB**
+- ✅ `--max-model-len 262144` (256K) — `Application startup complete`
+- ✅ `/v1/models` returns `max_model_len: 262144`
+- ✅ **31.31× max concurrency at 256K context** — single MI300X serves ~31 simultaneous users at full 256K context
+- ✅ Real Python code generation through `/v1/chat/completions` (merge sort / LCS / hello world)
+- ✅ Cost of smoke test: ~$1.00 of $100 credits
+This Space currently still runs on CPU-basic with the **mock LLM backend** because exposing a public API requires keeping a paid MI300X droplet up — final demo will be wired to a live MI300X endpoint during submission window.
 If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** 🤗