Spaces:

TheLinconX
/

contextforge-demo

Sleeping

App Files Files Community

Pablo Suarez commited on 3 days ago

Commit

b4f1a79

1 Parent(s): d9c2197

fix: S04 status corrected to DONE, screenshots section updated, Engineering Principles count corrected to 10

Browse files

Files changed (1) hide show

README.md +25 -15

README.md CHANGED Viewed

@@ -150,13 +150,9 @@ ContextForge eliminates this through 10 silicon-native mechanisms running at the
 ## 🖥️ Live Dashboard
-**Gradio Dashboard** running on AMD DevCloud MI300X:
-![Live Demo Tab](assets/screenshots/demo_tab.png)
-![Benchmark Results Tab](assets/screenshots/benchmark_tab.png)
-![Architecture Tab](assets/screenshots/architecture_tab.png)
 ```bash
 # Launch Gradio dashboard
@@ -175,17 +171,20 @@ python demo/app.py
 | S01 | AnchorPool | `kv_offset/anchor_pool.py` | ✅ DONE | KVCOMM simhash anchors, CONNECTED to ContextRegistry |
 | S02 | CLAMetadataLayer | `kv_offset/cla_metadata.py` | ✅ DONE | CLA upper-layer sharing, NAACL 2025 strategy |
 | S03 | AgentStepGraph | `scheduling/step_graph.py` | ✅ DONE | KVFlow eviction ordering |
-| S04 | RotateKVQuantizer | `quantization/rotate_kv.py` | ⚠️ FIX | Array indexing bug (4D→2D), fix pending |
 | S05 | LSHEngine | `dedup/lsh_engine.py` | ✅ DONE | SimHash block_size=16 |
 | S06 | FAISSContextIndex | `dedup/faiss_index.py` | ✅ DONE | dim=512, IndexIVFFlat |
 | S07 | KVAwareRouter | `routing/kv_aware_router.py` | ✅ DONE | anchor locality + CLA affinity |
 | S08 | LMCacheBridge | `serving/lmcache_bridge.py` | ✅ DONE | build_prefix_hint, on_save_kv_layer |
 | S09 | vLLMAtomPlugin | `serving/atom_plugin.py` | ✅ DONE | entry_point=vllm.general_plugins |
 | S10 | PBKVPredictor | `scheduling/pbkv_predictor.py` | ✅ DONE | 2nd-order Markov, blend_alpha=0.6 |
-| S11 | SpeculativeCoordinator | `decoding/speculative_coordinator.py` | ✅ DONE | acceptance_rate 0.50 (target >0.70 pending) |
 | S12 | VisualKVCache | `multimodal/visual_kv_cache.py` | ✅ DONE | **5.0× encoder reduction — VALIDATED** |
 | S13 | **QueueingController** | `scheduling/queueing_controller.py` | ✅ **DONE** | **λ_critical deviation 0.00% — VALIDATED** |
-| S14 | Gradio Dashboard | `demo/app.py` | ✅ DONE | Running live on MI300X |
 ---
@@ -224,6 +223,7 @@ apohara_context_forge/
 ├── serving/
 │   ├── lmcache_bridge.py            # LMCacheConnectorV1
 │   ├── atom_plugin.py               # vLLM ATOM plugin
 │   └── vllm_client.py
 │
 ├── routing/
@@ -239,6 +239,12 @@ apohara_context_forge/
 │   ├── context_registry.py
 │   └── vram_aware_cache.py
 │
 ├── compression/
 │   ├── coordinator.py
 │   ├── compressor.py
@@ -269,6 +275,8 @@ apohara_context_forge/
 | 6 | **CLA** — Cross-Layer Attention | NeurIPS 2024 | — | `CLAMetadataLayer.compute_layer_groups()` |
 | 7 | **Queuing Theory KV Cache** | ICML 2026 | [2605.04595](https://arxiv.org/abs/2605.04595) | `QueueingController` — **0.00% deviation validated** |
 | 8 | **vLLM-Omni + AMD Batch-Level DP** | Feb 2026 | [2602.02204](https://arxiv.org/abs/2602.02204) | `VisualKVCache` — **5.0× reduction validated** |
 ---
@@ -281,7 +289,7 @@ git clone https://github.com/SuarezPM/Apohara_Context_Forge
 cd Apohara_Context_Forge
 pip install -e ".[rocm]"
-# Run V5 benchmark
 python demo/benchmark_v5.py
 # Launch Gradio dashboard
@@ -308,16 +316,18 @@ docker compose up apohara
 | # | Principle | Description |
 |---|-----------|-------------|
 | **1** | **Silicon-Native First** | Every hot-path operation uses ROCm-native libraries (PyRSMI, HIP, Triton-ROCm). No subprocess calls in hot paths. |
-| **2** | **8 Papers, 0 Hacks** | Every optimization backed by peer-reviewed paper. No magic constants. |
 | **3** | **Stability Over Utilization** | QueueingController chooses VRAM safety over peak utilization. INVARIANT-11 is not a suggestion. |
 | **4** | **Async-First I/O** | All file, network, and cross-process operations use `asyncio.run_in_executor`. |
 | **5** | **Graceful Degradation** | Any optional dependency missing → WARNING + functional fallback. |
 | **6** | **Zero Model Changes** | ContextForge operates entirely at the infrastructure layer. ATOM plugin is the only integration point. |
-| **7** | **Invariant Compliance** | All 14 system invariants enforced in code. Violations raise `InvariantViolationError`. |
-| **8** | **Honest Reporting** | V5.0 reported S-3 / S-13 failures openly; V5.x landed surgical fixes (4D-indexing in `rotate_kv`, draft-prob estimate in `verify_and_commit`) and the run is now 15/15 PASS. No cherry-picking. |
 <details>
-<summary>🔒 System Invariants (14)</summary>
 | # | Invariant | Description | Enforced In |
 |---|-----------|-------------|-------------|
@@ -375,6 +385,6 @@ Apache 2.0 — chosen for its patent protection and corporate adoption.
 - **AMD Developer Cloud** — MI300X GPU access via [devcloud.amd.com/gpus](https://devcloud.amd.com/gpus)
 - **vLLM team** — ATOM plugin system and LMCache integration
-- **Paper authors:** KVCOMM · KVFlow · PBKV · RotateKV · CLA · QueueingTheory (ICML 2026) · vLLM-Omni
 - **Qwen team** — Qwen3-Embedding-0.6B ONNX
 - **LabLab.ai** — Hackathon platform

 ## 🖥️ Live Dashboard
+**Gradio Dashboard** running on AMD DevCloud MI300X — `http://129.212.188.18:7860`
+> 📸 Screenshots coming — dashboard is live at the URL above. Run `python demo/app.py` to launch locally.
 ```bash
 # Launch Gradio dashboard
 | S01 | AnchorPool | `kv_offset/anchor_pool.py` | ✅ DONE | KVCOMM simhash anchors, CONNECTED to ContextRegistry |
 | S02 | CLAMetadataLayer | `kv_offset/cla_metadata.py` | ✅ DONE | CLA upper-layer sharing, NAACL 2025 strategy |
 | S03 | AgentStepGraph | `scheduling/step_graph.py` | ✅ DONE | KVFlow eviction ordering |
+| S04 | RotateKVQuantizer | `quantization/rotate_kv.py` | ✅ DONE | 4D-indexing fix landed in V5.x — S-3 PASS validated |
 | S05 | LSHEngine | `dedup/lsh_engine.py` | ✅ DONE | SimHash block_size=16 |
 | S06 | FAISSContextIndex | `dedup/faiss_index.py` | ✅ DONE | dim=512, IndexIVFFlat |
 | S07 | KVAwareRouter | `routing/kv_aware_router.py` | ✅ DONE | anchor locality + CLA affinity |
 | S08 | LMCacheBridge | `serving/lmcache_bridge.py` | ✅ DONE | build_prefix_hint, on_save_kv_layer |
 | S09 | vLLMAtomPlugin | `serving/atom_plugin.py` | ✅ DONE | entry_point=vllm.general_plugins |
 | S10 | PBKVPredictor | `scheduling/pbkv_predictor.py` | ✅ DONE | 2nd-order Markov, blend_alpha=0.6 |
+| S11 | SpeculativeCoordinator | `decoding/speculative_coordinator.py` | ✅ DONE | acceptance ≥ 0.875, speedup 5.59–8.00× — VALIDATED |
 | S12 | VisualKVCache | `multimodal/visual_kv_cache.py` | ✅ DONE | **5.0× encoder reduction — VALIDATED** |
 | S13 | **QueueingController** | `scheduling/queueing_controller.py` | ✅ **DONE** | **λ_critical deviation 0.00% — VALIDATED** |
+| S14 | Gradio Dashboard | `demo/app.py` | ✅ DONE | Running live on MI300X — http://129.212.188.18:7860 |
+| S15 | TokenDanceStorage | `storage/token_dance.py` | ✅ DONE | **12× compression — VALIDATED** (V6.0) |
+| S16 | JCRSafetyGate | `safety/jcr_gate.py` | ✅ DONE | **INV-15 violations: 0 — VALIDATED** (V6.0) |
+| S17 | AITERConfig | `serving/aiter_config.py` | ✅ DONE | MI300X fused MoE/MHA/RMSNorm env vars (V6.0) |
 ---
 ├── serving/
 │   ├── lmcache_bridge.py            # LMCacheConnectorV1
 │   ├── atom_plugin.py               # vLLM ATOM plugin
+│   ├── aiter_config.py              # AMD AITER ROCm env vars (V6.0)
 │   └── vllm_client.py
 │
 ├── routing/
 │   ├── context_registry.py
 │   └── vram_aware_cache.py
 │
+├── storage/
+│   └── token_dance.py               # TokenDance Master-Mirror diff (V6.0)
+│
+├── safety/
+│   └── jcr_gate.py                  # JCR Safety Gate INV-15 (V6.0)
+│
 ├── compression/
 │   ├── coordinator.py
 │   ├── compressor.py
 | 6 | **CLA** — Cross-Layer Attention | NeurIPS 2024 | — | `CLAMetadataLayer.compute_layer_groups()` |
 | 7 | **Queuing Theory KV Cache** | ICML 2026 | [2605.04595](https://arxiv.org/abs/2605.04595) | `QueueingController` — **0.00% deviation validated** |
 | 8 | **vLLM-Omni + AMD Batch-Level DP** | Feb 2026 | [2602.02204](https://arxiv.org/abs/2602.02204) | `VisualKVCache` — **5.0× reduction validated** |
+| 9 | **TokenDance** — Collective KV Cache Sharing | Apr 2026 | [2604.03143](https://arxiv.org/abs/2604.03143) | `TokenDanceStorage` — **12× compression validated** |
+| 10 | **KV Cache Reuse Failure in Multi-Agent** | Jan 2026 | [2601.08343](https://arxiv.org/abs/2601.08343) | `JCRSafetyGate` — **INV-15: 0 violations validated** |
 ---
 cd Apohara_Context_Forge
 pip install -e ".[rocm]"
+# Run V6 benchmark (15/15 PASS)
 python demo/benchmark_v5.py
 # Launch Gradio dashboard
 | # | Principle | Description |
 |---|-----------|-------------|
 | **1** | **Silicon-Native First** | Every hot-path operation uses ROCm-native libraries (PyRSMI, HIP, Triton-ROCm). No subprocess calls in hot paths. |
+| **2** | **10 Papers, 0 Hacks** | Every optimization backed by peer-reviewed paper. No magic constants. |
 | **3** | **Stability Over Utilization** | QueueingController chooses VRAM safety over peak utilization. INVARIANT-11 is not a suggestion. |
 | **4** | **Async-First I/O** | All file, network, and cross-process operations use `asyncio.run_in_executor`. |
 | **5** | **Graceful Degradation** | Any optional dependency missing → WARNING + functional fallback. |
 | **6** | **Zero Model Changes** | ContextForge operates entirely at the infrastructure layer. ATOM plugin is the only integration point. |
+| **7** | **Invariant Compliance** | All 15 system invariants enforced in code. Violations raise `InvariantViolationError`. |
+| **8** | **Honest Reporting** | V5.0 reported S-3 / S-13 failures openly; V5.x landed surgical fixes and the run is now 15/15 PASS. No cherry-picking. |
+| **9** | **Safety-First Reuse** | JCR Safety Gate (INV-15) detects when KV reuse would corrupt judge-type agents and falls back to dense prefill automatically. |
+| **10** | **AITER Native** | AMD AI Tensor Engine for ROCm configured for fused MoE/MHA/RMSNorm/Linear kernels on MI300X. |
 <details>
+<summary>🔒 System Invariants (15)</summary>
 | # | Invariant | Description | Enforced In |
 |---|-----------|-------------|-------------|
 - **AMD Developer Cloud** — MI300X GPU access via [devcloud.amd.com/gpus](https://devcloud.amd.com/gpus)
 - **vLLM team** — ATOM plugin system and LMCache integration
+- **Paper authors:** KVCOMM · KVFlow · PBKV · RotateKV · CLA · QueueingTheory (ICML 2026) · vLLM-Omni · TokenDance · JCR Safety
 - **Qwen team** — Qwen3-Embedding-0.6B ONNX
 - **LabLab.ai** — Hackathon platform