docs: status section -> verified on real MI300X (256K, 31.31x concurrency, 95.26 GiB KV cache, 77.29 GiB weights)
Browse files
README.md
CHANGED
|
@@ -43,9 +43,19 @@ This is a memory-architecture story, not a CUDA-vs-ROCm one.
|
|
| 43 |
- **Agent loop**: SC-TIR style (PLAN → CALL TOOL → OBSERVE → THINK → ANSWER)
|
| 44 |
- **Tools**: `read_file` · `grep_codebase` · `execute_code` (sandboxed) · `run_tests` · `git_log`
|
| 45 |
|
| 46 |
-
## Status
|
| 47 |
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** 🤗
|
| 51 |
|
|
|
|
| 43 |
- **Agent loop**: SC-TIR style (PLAN → CALL TOOL → OBSERVE → THINK → ANSWER)
|
| 44 |
- **Tools**: `read_file` · `grep_codebase` · `execute_code` (sandboxed) · `run_tests` · `git_log`
|
| 45 |
|
| 46 |
+
## Status — verified on real MI300X (2026-05-05)
|
| 47 |
|
| 48 |
+
Smoke test on a single AMD MI300X x1 (AMD Developer Cloud, $1.99/hr, vLLM 0.17.1 + ROCm 7.2 Quick Start image):
|
| 49 |
+
|
| 50 |
+
- ✅ Model weights in VRAM: **77.29 GiB**
|
| 51 |
+
- ✅ Available KV cache: **95.26 GiB**
|
| 52 |
+
- ✅ `--max-model-len 262144` (256K) — `Application startup complete`
|
| 53 |
+
- ✅ `/v1/models` returns `max_model_len: 262144`
|
| 54 |
+
- ✅ **31.31× max concurrency at 256K context** — single MI300X serves ~31 simultaneous users at full 256K context
|
| 55 |
+
- ✅ Real Python code generation through `/v1/chat/completions` (merge sort / LCS / hello world)
|
| 56 |
+
- ✅ Cost of smoke test: ~$1.00 of $100 credits
|
| 57 |
+
|
| 58 |
+
This Space currently still runs on CPU-basic with the **mock LLM backend** because exposing a public API requires keeping a paid MI300X droplet up — final demo will be wired to a live MI300X endpoint during submission window.
|
| 59 |
|
| 60 |
If the MI300X memory-architecture pitch resonates, **a like on this Space helps us with the Hugging Face Special Prize judging** 🤗
|
| 61 |
|