Spaces:
Sleeping
Sleeping
Pablo Suarez commited on
Commit ·
b4f1a79
1
Parent(s): d9c2197
fix: S04 status corrected to DONE, screenshots section updated, Engineering Principles count corrected to 10
Browse files
README.md
CHANGED
|
@@ -150,13 +150,9 @@ ContextForge eliminates this through 10 silicon-native mechanisms running at the
|
|
| 150 |
|
| 151 |
## 🖥️ Live Dashboard
|
| 152 |
|
| 153 |
-
**Gradio Dashboard** running on AMD DevCloud MI300X:
|
| 154 |
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-

|
| 158 |
-
|
| 159 |
-

|
| 160 |
|
| 161 |
```bash
|
| 162 |
# Launch Gradio dashboard
|
|
@@ -175,17 +171,20 @@ python demo/app.py
|
|
| 175 |
| S01 | AnchorPool | `kv_offset/anchor_pool.py` | ✅ DONE | KVCOMM simhash anchors, CONNECTED to ContextRegistry |
|
| 176 |
| S02 | CLAMetadataLayer | `kv_offset/cla_metadata.py` | ✅ DONE | CLA upper-layer sharing, NAACL 2025 strategy |
|
| 177 |
| S03 | AgentStepGraph | `scheduling/step_graph.py` | ✅ DONE | KVFlow eviction ordering |
|
| 178 |
-
| S04 | RotateKVQuantizer | `quantization/rotate_kv.py` |
|
| 179 |
| S05 | LSHEngine | `dedup/lsh_engine.py` | ✅ DONE | SimHash block_size=16 |
|
| 180 |
| S06 | FAISSContextIndex | `dedup/faiss_index.py` | ✅ DONE | dim=512, IndexIVFFlat |
|
| 181 |
| S07 | KVAwareRouter | `routing/kv_aware_router.py` | ✅ DONE | anchor locality + CLA affinity |
|
| 182 |
| S08 | LMCacheBridge | `serving/lmcache_bridge.py` | ✅ DONE | build_prefix_hint, on_save_kv_layer |
|
| 183 |
| S09 | vLLMAtomPlugin | `serving/atom_plugin.py` | ✅ DONE | entry_point=vllm.general_plugins |
|
| 184 |
| S10 | PBKVPredictor | `scheduling/pbkv_predictor.py` | ✅ DONE | 2nd-order Markov, blend_alpha=0.6 |
|
| 185 |
-
| S11 | SpeculativeCoordinator | `decoding/speculative_coordinator.py` | ✅ DONE |
|
| 186 |
| S12 | VisualKVCache | `multimodal/visual_kv_cache.py` | ✅ DONE | **5.0× encoder reduction — VALIDATED** |
|
| 187 |
| S13 | **QueueingController** | `scheduling/queueing_controller.py` | ✅ **DONE** | **λ_critical deviation 0.00% — VALIDATED** |
|
| 188 |
-
| S14 | Gradio Dashboard | `demo/app.py` | ✅ DONE | Running live on MI300X |
|
|
|
|
|
|
|
|
|
|
| 189 |
|
| 190 |
---
|
| 191 |
|
|
@@ -224,6 +223,7 @@ apohara_context_forge/
|
|
| 224 |
├── serving/
|
| 225 |
│ ├── lmcache_bridge.py # LMCacheConnectorV1
|
| 226 |
│ ├── atom_plugin.py # vLLM ATOM plugin
|
|
|
|
| 227 |
│ └── vllm_client.py
|
| 228 |
│
|
| 229 |
├── routing/
|
|
@@ -239,6 +239,12 @@ apohara_context_forge/
|
|
| 239 |
│ ├── context_registry.py
|
| 240 |
│ └── vram_aware_cache.py
|
| 241 |
│
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 242 |
├── compression/
|
| 243 |
│ ├── coordinator.py
|
| 244 |
│ ├── compressor.py
|
|
@@ -269,6 +275,8 @@ apohara_context_forge/
|
|
| 269 |
| 6 | **CLA** — Cross-Layer Attention | NeurIPS 2024 | — | `CLAMetadataLayer.compute_layer_groups()` |
|
| 270 |
| 7 | **Queuing Theory KV Cache** | ICML 2026 | [2605.04595](https://arxiv.org/abs/2605.04595) | `QueueingController` — **0.00% deviation validated** |
|
| 271 |
| 8 | **vLLM-Omni + AMD Batch-Level DP** | Feb 2026 | [2602.02204](https://arxiv.org/abs/2602.02204) | `VisualKVCache` — **5.0× reduction validated** |
|
|
|
|
|
|
|
| 272 |
|
| 273 |
---
|
| 274 |
|
|
@@ -281,7 +289,7 @@ git clone https://github.com/SuarezPM/Apohara_Context_Forge
|
|
| 281 |
cd Apohara_Context_Forge
|
| 282 |
pip install -e ".[rocm]"
|
| 283 |
|
| 284 |
-
# Run
|
| 285 |
python demo/benchmark_v5.py
|
| 286 |
|
| 287 |
# Launch Gradio dashboard
|
|
@@ -308,16 +316,18 @@ docker compose up apohara
|
|
| 308 |
| # | Principle | Description |
|
| 309 |
|---|-----------|-------------|
|
| 310 |
| **1** | **Silicon-Native First** | Every hot-path operation uses ROCm-native libraries (PyRSMI, HIP, Triton-ROCm). No subprocess calls in hot paths. |
|
| 311 |
-
| **2** | **
|
| 312 |
| **3** | **Stability Over Utilization** | QueueingController chooses VRAM safety over peak utilization. INVARIANT-11 is not a suggestion. |
|
| 313 |
| **4** | **Async-First I/O** | All file, network, and cross-process operations use `asyncio.run_in_executor`. |
|
| 314 |
| **5** | **Graceful Degradation** | Any optional dependency missing → WARNING + functional fallback. |
|
| 315 |
| **6** | **Zero Model Changes** | ContextForge operates entirely at the infrastructure layer. ATOM plugin is the only integration point. |
|
| 316 |
-
| **7** | **Invariant Compliance** | All
|
| 317 |
-
| **8** | **Honest Reporting** | V5.0 reported S-3 / S-13 failures openly; V5.x landed surgical fixes
|
|
|
|
|
|
|
| 318 |
|
| 319 |
<details>
|
| 320 |
-
<summary>🔒 System Invariants (
|
| 321 |
|
| 322 |
| # | Invariant | Description | Enforced In |
|
| 323 |
|---|-----------|-------------|-------------|
|
|
@@ -375,6 +385,6 @@ Apache 2.0 — chosen for its patent protection and corporate adoption.
|
|
| 375 |
|
| 376 |
- **AMD Developer Cloud** — MI300X GPU access via [devcloud.amd.com/gpus](https://devcloud.amd.com/gpus)
|
| 377 |
- **vLLM team** — ATOM plugin system and LMCache integration
|
| 378 |
-
- **Paper authors:** KVCOMM · KVFlow · PBKV · RotateKV · CLA · QueueingTheory (ICML 2026) · vLLM-Omni
|
| 379 |
- **Qwen team** — Qwen3-Embedding-0.6B ONNX
|
| 380 |
- **LabLab.ai** — Hackathon platform
|
|
|
|
| 150 |
|
| 151 |
## 🖥️ Live Dashboard
|
| 152 |
|
| 153 |
+
**Gradio Dashboard** running on AMD DevCloud MI300X — `http://129.212.188.18:7860`
|
| 154 |
|
| 155 |
+
> 📸 Screenshots coming — dashboard is live at the URL above. Run `python demo/app.py` to launch locally.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
|
| 157 |
```bash
|
| 158 |
# Launch Gradio dashboard
|
|
|
|
| 171 |
| S01 | AnchorPool | `kv_offset/anchor_pool.py` | ✅ DONE | KVCOMM simhash anchors, CONNECTED to ContextRegistry |
|
| 172 |
| S02 | CLAMetadataLayer | `kv_offset/cla_metadata.py` | ✅ DONE | CLA upper-layer sharing, NAACL 2025 strategy |
|
| 173 |
| S03 | AgentStepGraph | `scheduling/step_graph.py` | ✅ DONE | KVFlow eviction ordering |
|
| 174 |
+
| S04 | RotateKVQuantizer | `quantization/rotate_kv.py` | ✅ DONE | 4D-indexing fix landed in V5.x — S-3 PASS validated |
|
| 175 |
| S05 | LSHEngine | `dedup/lsh_engine.py` | ✅ DONE | SimHash block_size=16 |
|
| 176 |
| S06 | FAISSContextIndex | `dedup/faiss_index.py` | ✅ DONE | dim=512, IndexIVFFlat |
|
| 177 |
| S07 | KVAwareRouter | `routing/kv_aware_router.py` | ✅ DONE | anchor locality + CLA affinity |
|
| 178 |
| S08 | LMCacheBridge | `serving/lmcache_bridge.py` | ✅ DONE | build_prefix_hint, on_save_kv_layer |
|
| 179 |
| S09 | vLLMAtomPlugin | `serving/atom_plugin.py` | ✅ DONE | entry_point=vllm.general_plugins |
|
| 180 |
| S10 | PBKVPredictor | `scheduling/pbkv_predictor.py` | ✅ DONE | 2nd-order Markov, blend_alpha=0.6 |
|
| 181 |
+
| S11 | SpeculativeCoordinator | `decoding/speculative_coordinator.py` | ✅ DONE | acceptance ≥ 0.875, speedup 5.59–8.00× — VALIDATED |
|
| 182 |
| S12 | VisualKVCache | `multimodal/visual_kv_cache.py` | ✅ DONE | **5.0× encoder reduction — VALIDATED** |
|
| 183 |
| S13 | **QueueingController** | `scheduling/queueing_controller.py` | ✅ **DONE** | **λ_critical deviation 0.00% — VALIDATED** |
|
| 184 |
+
| S14 | Gradio Dashboard | `demo/app.py` | ✅ DONE | Running live on MI300X — http://129.212.188.18:7860 |
|
| 185 |
+
| S15 | TokenDanceStorage | `storage/token_dance.py` | ✅ DONE | **12× compression — VALIDATED** (V6.0) |
|
| 186 |
+
| S16 | JCRSafetyGate | `safety/jcr_gate.py` | ✅ DONE | **INV-15 violations: 0 — VALIDATED** (V6.0) |
|
| 187 |
+
| S17 | AITERConfig | `serving/aiter_config.py` | ✅ DONE | MI300X fused MoE/MHA/RMSNorm env vars (V6.0) |
|
| 188 |
|
| 189 |
---
|
| 190 |
|
|
|
|
| 223 |
├── serving/
|
| 224 |
│ ├── lmcache_bridge.py # LMCacheConnectorV1
|
| 225 |
│ ├── atom_plugin.py # vLLM ATOM plugin
|
| 226 |
+
│ ├── aiter_config.py # AMD AITER ROCm env vars (V6.0)
|
| 227 |
│ └── vllm_client.py
|
| 228 |
│
|
| 229 |
├── routing/
|
|
|
|
| 239 |
│ ├── context_registry.py
|
| 240 |
│ └── vram_aware_cache.py
|
| 241 |
│
|
| 242 |
+
├── storage/
|
| 243 |
+
│ └── token_dance.py # TokenDance Master-Mirror diff (V6.0)
|
| 244 |
+
│
|
| 245 |
+
├── safety/
|
| 246 |
+
│ └── jcr_gate.py # JCR Safety Gate INV-15 (V6.0)
|
| 247 |
+
│
|
| 248 |
├── compression/
|
| 249 |
│ ├── coordinator.py
|
| 250 |
│ ├── compressor.py
|
|
|
|
| 275 |
| 6 | **CLA** — Cross-Layer Attention | NeurIPS 2024 | — | `CLAMetadataLayer.compute_layer_groups()` |
|
| 276 |
| 7 | **Queuing Theory KV Cache** | ICML 2026 | [2605.04595](https://arxiv.org/abs/2605.04595) | `QueueingController` — **0.00% deviation validated** |
|
| 277 |
| 8 | **vLLM-Omni + AMD Batch-Level DP** | Feb 2026 | [2602.02204](https://arxiv.org/abs/2602.02204) | `VisualKVCache` — **5.0× reduction validated** |
|
| 278 |
+
| 9 | **TokenDance** — Collective KV Cache Sharing | Apr 2026 | [2604.03143](https://arxiv.org/abs/2604.03143) | `TokenDanceStorage` — **12× compression validated** |
|
| 279 |
+
| 10 | **KV Cache Reuse Failure in Multi-Agent** | Jan 2026 | [2601.08343](https://arxiv.org/abs/2601.08343) | `JCRSafetyGate` — **INV-15: 0 violations validated** |
|
| 280 |
|
| 281 |
---
|
| 282 |
|
|
|
|
| 289 |
cd Apohara_Context_Forge
|
| 290 |
pip install -e ".[rocm]"
|
| 291 |
|
| 292 |
+
# Run V6 benchmark (15/15 PASS)
|
| 293 |
python demo/benchmark_v5.py
|
| 294 |
|
| 295 |
# Launch Gradio dashboard
|
|
|
|
| 316 |
| # | Principle | Description |
|
| 317 |
|---|-----------|-------------|
|
| 318 |
| **1** | **Silicon-Native First** | Every hot-path operation uses ROCm-native libraries (PyRSMI, HIP, Triton-ROCm). No subprocess calls in hot paths. |
|
| 319 |
+
| **2** | **10 Papers, 0 Hacks** | Every optimization backed by peer-reviewed paper. No magic constants. |
|
| 320 |
| **3** | **Stability Over Utilization** | QueueingController chooses VRAM safety over peak utilization. INVARIANT-11 is not a suggestion. |
|
| 321 |
| **4** | **Async-First I/O** | All file, network, and cross-process operations use `asyncio.run_in_executor`. |
|
| 322 |
| **5** | **Graceful Degradation** | Any optional dependency missing → WARNING + functional fallback. |
|
| 323 |
| **6** | **Zero Model Changes** | ContextForge operates entirely at the infrastructure layer. ATOM plugin is the only integration point. |
|
| 324 |
+
| **7** | **Invariant Compliance** | All 15 system invariants enforced in code. Violations raise `InvariantViolationError`. |
|
| 325 |
+
| **8** | **Honest Reporting** | V5.0 reported S-3 / S-13 failures openly; V5.x landed surgical fixes and the run is now 15/15 PASS. No cherry-picking. |
|
| 326 |
+
| **9** | **Safety-First Reuse** | JCR Safety Gate (INV-15) detects when KV reuse would corrupt judge-type agents and falls back to dense prefill automatically. |
|
| 327 |
+
| **10** | **AITER Native** | AMD AI Tensor Engine for ROCm configured for fused MoE/MHA/RMSNorm/Linear kernels on MI300X. |
|
| 328 |
|
| 329 |
<details>
|
| 330 |
+
<summary>🔒 System Invariants (15)</summary>
|
| 331 |
|
| 332 |
| # | Invariant | Description | Enforced In |
|
| 333 |
|---|-----------|-------------|-------------|
|
|
|
|
| 385 |
|
| 386 |
- **AMD Developer Cloud** — MI300X GPU access via [devcloud.amd.com/gpus](https://devcloud.amd.com/gpus)
|
| 387 |
- **vLLM team** — ATOM plugin system and LMCache integration
|
| 388 |
+
- **Paper authors:** KVCOMM · KVFlow · PBKV · RotateKV · CLA · QueueingTheory (ICML 2026) · vLLM-Omni · TokenDance · JCR Safety
|
| 389 |
- **Qwen team** — Qwen3-Embedding-0.6B ONNX
|
| 390 |
- **LabLab.ai** — Hackathon platform
|