TheLinconX commited on
Commit
d8ddf05
·
verified ·
1 Parent(s): 3440b8c

Update readme

Browse files
Files changed (1) hide show
  1. README.md +4 -427
README.md CHANGED
@@ -1,435 +1,12 @@
1
- <p align="center">
2
- <img src="assets/apohara-contextforge-logo.png" alt="Apohara · ContextForge" width="460">
3
- </p>
4
-
5
- <h1 align="center">APOHARA · ContextForge</h1>
6
-
7
- <p align="center">
8
- <strong>The shared-context compiler for multi-agent LLM pipelines.</strong><br>
9
- Silicon-native KV cache coordination for AMD Instinct MI300X.
10
- </p>
11
-
12
- <p align="center">
13
- <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11%2B-2B5DF2.svg" alt="Python 3.11+"></a>
14
- <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-2ECC71.svg" alt="License Apache 2.0"></a>
15
- <a href="https://rocm.docs.amd.com/"><img src="https://img.shields.io/badge/ROCm-7.x-FF6B00.svg" alt="ROCm 7.x"></a>
16
- <a href="https://lablab.ai/event/amd-hackathon"><img src="https://img.shields.io/badge/AMD-Hackathon-ED1C24.svg" alt="AMD Hackathon"></a>
17
- <a href="#-research-foundation"><img src="https://img.shields.io/badge/papers-10%20implemented-9B59B6.svg" alt="10 Papers"></a>
18
- <a href="#-benchmark-results"><img src="https://img.shields.io/badge/benchmark-15%2F15%20PASS-27AE60.svg" alt="V6.0 15/15 PASS"></a>
19
- <a href="#-verification"><img src="https://img.shields.io/badge/tests-310%20passed%20%C2%B7%200%20failed-27AE60.svg" alt="310 tests passing"></a>
20
- <a href="https://youtu.be/swEcn-6pAmA"><img src="https://img.shields.io/badge/%E2%96%B6%EF%B8%8F-watch%20demo%20on%20YouTube-FF0000.svg" alt="Watch the demo on YouTube"></a>
21
- </p>
22
-
23
- <p align="center">
24
- <a href="#-the-problem">Problem</a> ·
25
- <a href="#-the-solution">Solution</a> ·
26
- <a href="#%EF%B8%8F-demo-video"><b>▶️ Demo video</b></a> ·
27
- <a href="#-benchmark-results">Benchmarks</a> ·
28
- <a href="#-architecture">Architecture</a> ·
29
- <a href="#-quick-start">Quick Start</a> ·
30
- <a href="#-research-foundation">Research</a> ·
31
- <a href="#-business-value">Business Value</a>
32
- </p>
33
-
34
- ---
35
-
36
- ## ⚡ The Problem
37
-
38
- In a 5-agent pipeline — **Retriever → Reranker → Summarizer → Critic → Responder** — every agent independently materializes identical KV-cache entries for the shared context (system prompt, user query, retrieved documents). On a 35B MoE model with 192 GB HBM3, this redundancy wastes **40–60 % of VRAM** before a single output token is generated.
39
-
40
- ```text
41
- WITHOUT ContextForge (VRAM duplication per agent):
42
- Agent 1 (Retriever) → [KV: system + query + docs] 12 GB
43
- Agent 2 (Reranker) → [KV: system + query + docs] 12 GB ← DUPLICATE
44
- Agent 3 (Summarizer) → [KV: system + query + docs] 12 GB ← DUPLICATE
45
- Agent 4 (Critic) → [KV: system + query + docs] 12 GB ← DUPLICATE
46
- Agent 5 (Responder) → [KV: system + query + docs] 12 GB ← DUPLICATE
47
- ──────────────────────────────────────────────────────────────────
48
- Total KV VRAM: 60 GB for context that should need 12 GB
49
- ```
50
-
51
- ContextForge intercepts at the vLLM ATOM plugin level — zero model changes, zero latency overhead, shared PagedAttention blocks before materialization.
52
-
53
- ---
54
-
55
- ## 🧠 The Solution
56
-
57
- ContextForge coordinates KV-block sharing across all agents through **10 peer-reviewed mechanisms**, intercepting KV-cache operations at the vLLM V1 ATOM plugin interface. Before any agent materializes a KV block, ContextForge checks whether an identical or semantically equivalent block already exists in the shared registry — and a JCR Safety Gate (V6.0) decides when reuse would corrupt judge-type agents, falling back to dense prefill.
58
-
59
- Every optimization traces back to a peer-reviewed paper published at **NeurIPS, ICML, ACL, IJCAI, or arXiv 2026**.
60
-
61
- <p align="center">
62
- <img src="assets/systems-diagram.jpeg" alt="ContextForge — shared KV via ATOM plugin" width="760">
63
- </p>
64
-
65
- ### The 10 Mechanisms
66
-
67
- | # | Mechanism | Source | What it does |
68
- |---|-----------|--------|-------------|
69
- | 1 | **KVCOMM** | NeurIPS 2025 · [arXiv:2510.12872](https://arxiv.org/abs/2510.12872) | SimHash anchor matching for cross-context offset hints — zero RoPE drift |
70
- | 2 | **KVFlow** | NeurIPS 2025 | Workflow-step graph eviction — evict agents farthest from execution first |
71
- | 3 | **PBKV** | May 2026 | 2nd-order Markov predictor — 1.26× faster than KVFlow |
72
- | 4 | **SemShareKV** | ACL Findings 2025 | LSH + FAISS semantic dedup on Qwen3-Embed-0.6B ONNX |
73
- | 5 | **RotateKV** | IJCAI 2025 · [arXiv:2501.16383](https://arxiv.org/abs/2501.16383) | Pre-RoPE INT4 quantization — 3.97× VRAM reduction, attention-sink protected |
74
- | 6 | **CLA + LCKV** | NeurIPS 2024 + NAACL 2025 | Cross-layer upper-KV sharing — 50 % savings on upper layers |
75
- | 7 | **Queueing Theory** | ICML 2026 | λ_critical stability model — replaces 5 empirical thresholds with rigorous math |
76
- | 8 | **VisualKVCache** | Feb 2026 | SHA-256 content-hash for images — +44.9 % throughput at 1024 px |
77
- | 9 | **TokenDance** *(V6)* | Apr 2026 · [arXiv:2604.03143](https://arxiv.org/abs/2604.03143) | Master-Mirror diff storage — **10–17× KV compression** for committee inference |
78
- | 10 | **JCR Safety Gate** *(V6)* | Jan 2026 · [arXiv:2601.08343](https://arxiv.org/abs/2601.08343) | INV-15: Critic agent dense prefill when JCR risk > 0.7 |
79
-
80
- **Built on AMD-native stack:** ROCm 7.x · AITER · PyRSMI · ATOM plugin · HIP · vLLM V1 · LMCache · AMD DevCloud MI300X.
81
-
82
- ---
83
-
84
- ## 🎬 Live Demo
85
-
86
- Real metrics from `demo/app.py` running against the full ContextForge stack — five agents, real Qwen3 tokenizer, real LSH+FAISS dedup, INV-15 enforced live. Side-by-side comparison: **263 → 53 tokens, 79.85 % savings** with ContextForge; passthrough on the right.
87
-
88
- ### ▶️ Demo video
89
-
90
- <p align="center">
91
- <a href="https://youtu.be/swEcn-6pAmA" title="Watch the ContextForge demo on YouTube">
92
- <img src="https://img.youtube.com/vi/swEcn-6pAmA/maxresdefault.jpg"
93
- alt="▶️ Watch the ContextForge demo on YouTube — 79.85% token savings, INV-15 firing on the Critic"
94
- width="780">
95
- </a>
96
- </p>
97
-
98
- <p align="center">
99
- <a href="https://youtu.be/swEcn-6pAmA"><b>▶️ Watch on YouTube</b></a>
100
- &nbsp;·&nbsp;
101
- <a href="https://github.com/SuarezPM/Apohara_Context_Forge/raw/main/assets/video_live.mp4">Download raw mp4 (5.2 MB)</a>
102
- <br>
103
- <em>End-to-end run: query → 5-agent pipeline → 79.85 % token savings → JCR Safety Gate fires INV-15 on the Critic.</em>
104
- </p>
105
-
106
- ### Static frame
107
-
108
- <p align="center">
109
- <img src="assets/screenshots/dashboard_live_demo.png" alt="Live Demo — With vs Without ContextForge, 79.85% savings, INV-15 firing on the Critic" width="960"><br>
110
- <em>Live Demo tab — left: query input. Right: <b>With ContextForge</b> (79.85 % savings, INV-15 fires on the Critic) vs. <b>Without ContextForge</b> (passthrough, 0 % savings).</em>
111
- </p>
112
-
113
- ```
114
- [ContextForge Enabled] Processed: What is machine learning and how does it work?
115
-
116
- agents: 5
117
- tokens_before: 263
118
- tokens_after: 53
119
- avg_ttft_ms: 23.78
120
- token_savings_pct: 79.85%
121
- dedup_rate_pct: 79.85%
122
- registry_size: 4
123
- vram_mode: relaxed
124
- strategy: register+lsh+faiss
125
-
126
- [JCR Safety Gate / INV-15]
127
- critic risk: 1.000
128
- critic dense_prefill: True
129
- reason: INV-15: judge role='critic' risk=1.00 > threshold=0.70 → dense prefill mandated
130
- ```
131
-
132
- ### V6 Live Snapshot — TokenDance + JCR Safety Gate
133
-
134
- <p align="center">
135
- <img src="assets/screenshots/dashboard_v6_snapshot.png" alt="Architecture tab — TokenDance Master-Mirror + JCR Safety Gate live snapshots" width="640"><br>
136
- <em>Architecture tab — <b>TokenDance Master-Mirror Storage</b> (5-agent demo, 4.71× compression) and <b>JCR Safety Gate</b> firing INV-15 (risk = 1.000, dense_prefill = True).</em>
137
- </p>
138
-
139
- <p align="center">
140
- <img src="assets/screenshots/dashboard_aiter_config.png" alt="AITER ROCm Config — MI300X" width="520"><br>
141
- <em>AITER ROCm Config (MI300X) — <code>rocm_available: True</code>, 7 documented env vars, AMD-published speedups: 3× fused MoE, 2× block-scale GEMM, 2-4× FP8 memory.</em>
142
- </p>
143
-
144
- ---
145
-
146
- ## 🏗️ Architecture
147
-
148
- ```mermaid
149
- flowchart TB
150
- subgraph Agents["5-Agent Pipeline"]
151
- A1[Retriever]
152
- A2[Reranker]
153
- A3[Summarizer]
154
- A4[Critic]
155
- A5[Responder]
156
- end
157
-
158
- subgraph CF["ContextForge MCP Server · FastAPI + asyncio"]
159
- direction TB
160
- REG["Context Registry<br/>register · clear · get_shared_context"]
161
- LSH["LSH Token Matcher<br/>SimHash · block-aligned"]
162
- FAISS["FAISS ANN Index<br/>O(log n) cosine search"]
163
- VRAM["VRAM-Aware Cache<br/>5-mode pressure eviction"]
164
- TD["TokenDance Storage<br/>Master + N-1 sparse diffs"]
165
- JCR{"JCR Safety Gate<br/>INV-15"}
166
- COORD["Compression Coordinator<br/>LLMLingua-2 + APC"]
167
- end
168
-
169
- subgraph Serving["AMD MI300X · ROCm 7.x"]
170
- VLLM["vLLM V1 + ATOM plugin<br/>--enable-prefix-caching"]
171
- AITER["AITER kernels<br/>fused MoE · MHA · GEMM"]
172
- HBM[("192 GB HBM3<br/>Qwen3.6-35B-A3B MoE")]
173
- end
174
-
175
- A1 & A2 & A3 & A4 & A5 -->|register context| REG
176
- REG --> LSH --> FAISS --> VRAM
177
- REG --> TD
178
- A4 --> JCR
179
- JCR -->|risk > 0.7| VLLM
180
- JCR -->|risk ≤ 0.7| COORD
181
- REG --> COORD
182
- COORD --> VLLM
183
- VLLM --> AITER --> HBM
184
-
185
- style JCR fill:#FF6B00,stroke:#fff,color:#fff
186
- style TD fill:#FF6B00,stroke:#fff,color:#fff
187
- style AITER fill:#ED1C24,stroke:#fff,color:#fff
188
- style HBM fill:#ED1C24,stroke:#fff,color:#fff
189
- ```
190
-
191
- ---
192
-
193
- ## 📊 Benchmark Results
194
-
195
- > ✅ **Validated on AMD Instinct MI300X (192 GB HBM3) — AMD DevCloud ATL1 · 2026-05-10**
196
-
197
- ### V6.0 Benchmark — 15 / 15 PASS
198
-
199
- | # | Scenario | Time (ms) | Throughput (tok/s) | VRAM (GB) | Result |
200
- |----|----------|-----------|--------------------|-----------|--------|
201
- | 1 | anchor_pool_resolution | 2.87 | 173,986 | 0.10 | ✅ PASS |
202
- | 2 | cla_metadata_layer | 0.28 | 5,620,918 | 0.05 | ✅ PASS |
203
- | 3 | rotate_kv_quantization | 21.70 | 1,510,156 | 0.20 | ✅ PASS |
204
- | 4 | step_graph_execution | 0.37 | 268,906 | 0.30 | ✅ PASS |
205
- | 5 | kv_aware_routing | 0.04 | 269,251 | 0.10 | ✅ PASS |
206
- | 6 | lmcache_bridge_save_load | 0.03 | 3,752,204 | 0.05 | ✅ PASS |
207
- | 7 | atom_plugin_hooks | 0.11 | 6,961,486 | 0.10 | ✅ PASS |
208
- | 8 | pbkv_prediction | 0.12 | 581,207 | 0.05 | ✅ PASS |
209
- | 9 | workflow_aware_eviction | 0.02 | 6,127,076 | 0.10 | ✅ PASS |
210
- | 10 | embedding_engine_encoding | 268.86 | 20,457 | 0.10 | ✅ PASS |
211
- | 11 | **queueing_controller_stability** | 250.00 | 4,000 | 0.15 | ✅ **PASS** |
212
- | 12 | **visual_kvcache_cross_agent** | 150.00 | 177,633 | 0.01 | ✅ **PASS** |
213
- | 13 | **speculative_coordinator_speedup** | 100.00 | 80 | 0.05 | ✅ **PASS** |
214
- | 14 | **token_dance_compression** *(V6)* | 120.00 | 20,000 | 0.00 | ✅ **PASS** |
215
- | 15 | **jcr_gate_critic_safety** *(V6)* | 5.00 | 1,800 | 0.00 | ✅ **PASS** |
216
-
217
- ### V6.0 Key Targets — 8 / 8 PASS
218
-
219
- | Metric | Result | Target | Status |
220
- |--------|--------|--------|--------|
221
- | QueueingController λ_critical deviation | **0.00 %** | < 10 % | ✅ |
222
- | VisualKVCache encoder-call reduction | **5.0 ×** | ≥ 4 × | ✅ |
223
- | Speculative acceptance rate | **≥ 0.875** | > 0.70 | ✅ |
224
- | Speculative speedup | **5.59–8.00 ×** | > 2 × | ✅ |
225
- | TokenDance compression ratio | **10.81 ×** | ≥ 10 × | ✅ |
226
- | TokenDance reconstruction error | **1.19 × 10⁻⁷** | ≤ 1 × 10⁻⁴ | ✅ |
227
- | JCR INV-15 violations | **0** | 0 | ✅ |
228
- | JCR Critic dense rate (high-risk sweep) | **1.000** | ≥ 0.5 | ✅ |
229
-
230
- <p align="center">
231
- <img src="assets/screenshots/benchmark_v6_terminal.png" alt="V6.0 benchmark terminal output — S-14 token_dance 10.81x, S-15 jcr_gate 0 violations" width="780"><br>
232
- <em>Live terminal output of <code>python demo/benchmark_v5.py</code> — S-14 TokenDance <b>10.81×</b> compression with reconstruction error <b>1.19e-07</b>, S-15 JCR Safety Gate <b>0 INV-15 violations</b>.</em>
233
- </p>
234
-
235
- ---
236
-
237
- ## 📈 Key Stats
238
-
239
- | Metric | Value |
240
- |--------|-------|
241
- | Live token savings (5-agent demo) | **79.85 %** |
242
- | Multi-agent VRAM reduction | **68 %** |
243
- | TTFT improvement | **7.8 ×** |
244
- | TokenDance compression (12-agent committee) | **10.81 ×** |
245
- | JCR Safety Gate INV-15 violations | **0** |
246
- | Tests passing | **310 / 310** *(0 failed · 23 skipped)* |
247
- | Benchmark scenarios | **15 / 15 PASS** |
248
- | Peer-reviewed papers implemented | **10** |
249
- | System invariants enforced | **15** |
250
-
251
- <p align="center">
252
- <img src="assets/screenshots/dashboard_key_stats.png" alt="Architecture tab — Key Statistics panel" width="520"><br>
253
- <em>Key Statistics panel rendered live in the dashboard's Architecture tab.</em>
254
- </p>
255
-
256
- ---
257
-
258
- ## 🚀 Quick Start
259
-
260
- ### Prerequisites
261
-
262
- - Python 3.11 +
263
- - AMD GPU with ROCm 7.x **or** any CPU box for hermetic dev
264
- - 16 GB RAM minimum (192 GB HBM3 recommended for full vLLM run)
265
-
266
- ### Install
267
-
268
- ```bash
269
- git clone https://github.com/SuarezPM/Apohara_Context_Forge.git
270
- cd Apohara_Context_Forge
271
- pip install -e .
272
- ```
273
-
274
- ### Run the benchmark
275
-
276
- ```bash
277
- python demo/benchmark_v5.py
278
- # → 15/15 PASS · all 8 V5+V6 targets PASS
279
- ```
280
-
281
- ### Launch the dashboard
282
-
283
- ```bash
284
- python demo/app.py
285
- # Open http://localhost:7860
286
- ```
287
-
288
- Four tabs: **Live Demo** · **Real-time Metrics** · **Benchmark Results** · **Architecture**
289
-
290
- ### Run the test suite
291
-
292
- ```bash
293
- PYTHONPATH=. pytest tests/ -q
294
- # → 310 passed · 23 skipped · 0 failed
295
- ```
296
-
297
- ---
298
-
299
- ## 🔬 Research Foundation
300
-
301
- ContextForge implements **six 2025–2026 papers** as production code, plus four established baselines. Every numeric claim in this README is backed by a peer-reviewed result.
302
-
303
- | Paper | Venue · Year | Module | Validated metric |
304
- |-------|--------------|--------|------------------|
305
- | KVCOMM · [arXiv:2510.12872](https://arxiv.org/abs/2510.12872) | NeurIPS 2025 | `kv_offset/anchor_pool.py` | 7.8× TTFT improvement |
306
- | RotateKV · [arXiv:2501.16383](https://arxiv.org/abs/2501.16383) | IJCAI 2025 | `quantization/rotate_kv.py` | 3.97× VRAM reduction at INT4 |
307
- | Cross-Attention Speculative · [arXiv:2505.24544](https://arxiv.org/abs/2505.24544) | May 2026 | `decoding/speculative_coordinator.py` | 5.59–8 × decode speedup |
308
- | Queueing-aware vLLM · ICML 2026 | ICML 2026 | `scheduling/queueing_controller.py` | 0.00 % λ_critical deviation |
309
- | **TokenDance** · [arXiv:2604.03143](https://arxiv.org/abs/2604.03143) | Apr 2026 | `storage/token_dance.py` | 10.81× compression, 1.19e-7 error |
310
- | **JCR Failure Mode** · [arXiv:2601.08343](https://arxiv.org/abs/2601.08343) | Jan 2026 | `safety/jcr_gate.py` | INV-15 — 0 violations across sweep |
311
- | LLMLingua-2 | ACL 2024 | `compression/compressor.py` | 8× memory reduction |
312
- | CLA + LCKV | NeurIPS 2024 + NAACL 2025 | `kv_offset/cla_metadata.py` | 50 % upper-layer KV savings |
313
- | VisualKVCache | Feb 2026 | `multimodal/visual_kv_cache.py` | 5.0× encoder-call reduction |
314
- | vLLM ATOM plugin (production) | vLLM 0.9.x | `serving/atom_plugin.py` | Native V1 KV interception |
315
-
316
- ---
317
-
318
- ## 🟥 Why AMD Instinct MI300X
319
-
320
- ContextForge is **silicon-native** for the MI300X — not a port of CUDA code, not a generic "ROCm-compatible" wrapper.
321
-
322
- | Layer | What we use | Why MI300X |
323
- |-------|-------------|------------|
324
- | **HBM** | 192 GB HBM3 (single-GPU 35B MoE) | Fits Qwen3.6-235B-A22B without tensor-parallelism overhead |
325
- | **Compute** | AITER fused MoE + MHA kernels | **3× faster MoE**, **2× block-scale GEMM**, FP8 2-4× memory |
326
- | **Telemetry** | PyRSMI / `/sys/class/drm` | Real-time VRAM pressure for the 5-mode eviction policy |
327
- | **Networking** | RCCL · `NCCL_MIN_NCHANNELS=112` | Multi-GPU collective KV sharing (TokenDance All-Gather) |
328
- | **Plugin surface** | vLLM V1 ATOM (`vllm.general_plugins`) | Zero model code change — intercept BEFORE block materialization |
329
- | **Stability flag** | `AITER_ENABLE_VSKIP=0` | Hard-coded by [`AITERConfig`](apohara_context_forge/serving/aiter_config.py) — prevents documented kernel crashes |
330
-
331
- > **Validated on AMD DevCloud ATL1.** All 15 benchmark scenarios run on real MI300X hardware with ROCm 7.x — see `logs/benchmark_v6_final.txt`.
332
-
333
- ---
334
-
335
- ## 💼 Business Value
336
-
337
- ### TAM / SAM / SOM
338
-
339
- | Tier | Definition | 2027 estimate |
340
- |------|------------|---------------|
341
- | **TAM** | Global LLM-inference market (all hardware, all workloads) | **$50 B** |
342
- | **SAM** | Multi-agent + RAG inference on AMD-class accelerators | **$8 B** |
343
- | **SOM** *(3-yr)* | Enterprise agentic platforms self-hosting on MI300X / MI325X | **$420 M** |
344
-
345
- ### Where the value lands
346
-
347
- - **40–60 % VRAM saved** per multi-agent workload → **fewer GPUs needed** for the same throughput. On a 192 GB MI300X box, that's $15-25 K of capex unlocked per node.
348
- - **7.8× TTFT improvement** + 5.59–8 × speculative speedup → response-time SLOs that were previously unreachable on commodity hardware become trivial.
349
- - **JCR Safety Gate (INV-15)** → the first engineered answer to "when does KV reuse silently break my judge agent?" — a known failure mode that has, until now, blocked KV reuse from production agentic pipelines.
350
-
351
- ### Revenue streams
352
-
353
- 1. **Enterprise SaaS** — managed ContextForge MCP servers per tenant, priced per-GPU-hour saved (verifiable via `metrics/snapshot`).
354
- 2. **Self-hosted license** — Apache-2.0 core, paid enterprise tier with SLAs, AITER tuning packs, and audit-grade INV-15 telemetry export.
355
- 3. **AMD partnership / co-marketing** — reference design for MI300X agentic deployments; flagship customer logo for the AMD AI Stack.
356
- 4. **Plugin marketplace** — third-party mechanisms (custom safety gates, vertical-specific routers) that ride the ContextForge MCP interface.
357
-
358
- ### Who buys it
359
-
360
- - **Foundation-model labs** running 5-agent reasoning stacks (debate, critic, planner architectures).
361
- - **Enterprise RAG vendors** with multi-tenant constraints — every shared system prompt is wasted VRAM today.
362
- - **Sovereign / on-prem GPU clusters** with AMD MI300X hardware that need a CUDA-free alternative to vLLM-only deployments.
363
-
364
- ---
365
-
366
- ## ✅ Verification
367
-
368
- | Check | Result |
369
- |-------|--------|
370
- | `pytest tests/` | **310 passed · 23 skipped · 0 failed** |
371
- | `python demo/benchmark_v5.py` | **15 / 15 PASS** · all 8 V5+V6 targets PASS |
372
- | `python demo/app.py` | Gradio 6.x · HTTP 200 on `/` · live 79.85 % savings |
373
- | Hermetic CI mode | No GPU, no TCP, no model downloads — all deps gated by `try / import` |
374
-
375
- System invariants enforced:
376
-
377
- | ID | Invariant | Module |
378
- |----|-----------|--------|
379
- | INV-10 | RotateKV pre-RoPE only — never quantize post-RoPE tensors | `rotate_kv.py` |
380
- | INV-11 | QueueingController never evicts below `ceil(λ × E[S] × E[blocks] × 1.15)` | `queueing_controller.py` |
381
- | INV-12 | SpeculativeCoordinator: target always generates final authoritative token | `speculative_coordinator.py` |
382
- | INV-13 | VisualKVCache content hash is SHA-256 of raw bytes — never of embeddings | `visual_kv_cache.py` |
383
- | INV-14 | Dashboard "SIMULATION MODE" banner shown for synthetic data | `app.py`, `dashboard.py` |
384
- | **INV-15** | **JCR Safety Gate: Critic uses dense prefill when risk > 0.7** | **`safety/jcr_gate.py`** |
385
-
386
- ---
387
-
388
- ## 🗺️ Roadmap
389
-
390
- | Version | Status | Highlights |
391
- |---------|--------|-----------|
392
- | V4.0 | ✅ Complete | AnchorPool · EmbeddingEngine ONNX · CLA metadata · RotateKV INT4 · StepGraph · KVAwareRouter · LMCacheBridge · ATOM plugin |
393
- | V5.0 | ✅ Complete | QueueingController (ICML 2026) · VisualKVCache · SpeculativeCoordinator · Gradio Dashboard |
394
- | V5.x | ✅ Complete | S-3 4D-indexing fix · S-13 acceptance criterion → 13 / 13 PASS |
395
- | **V6.0** | ✅ **Complete** | **TokenDance Master-Mirror · JCR Safety Gate (INV-15) · AITER ROCm config → 15 / 15 PASS** |
396
- | V6.x | 📋 Planned | Multi-node distributed KV via LMCache · HIP custom kernels for RotateKV FWHT · Plugin marketplace SDK |
397
-
398
- ---
399
-
400
- ## 🛠️ Tech Stack
401
-
402
- **Runtime · serving** Python 3.11+ · FastAPI · `Bun.serve()`-style lifespan · Gradio 6.x · Plotly · Pydantic 2 · uvicorn
403
-
404
- **Inference · KV** vLLM V1 (ATOM plugin) · LMCache · PyTorch ROCm · ONNX Runtime · transformers · LLMLingua-2
405
-
406
- **Index · math** FAISS (CPU + ROCm) · NumPy · SimHash 64-bit · M/G/1 queueing model · SHA-256 content hashing
407
-
408
- **AMD-native** ROCm 7.x · AITER (fused MoE / MHA / RMSNorm / GEMM) · PyRSMI · HIP · RCCL · MI300X HBM3
409
-
410
- ---
411
-
412
- ## 🤝 Contributing & License
413
-
414
- - **License:** Apache 2.0 — see [LICENSE](LICENSE).
415
- - **Issues / PRs:** [github.com/SuarezPM/Apohara_Context_Forge](https://github.com/SuarezPM/Apohara_Context_Forge).
416
- - **Contact:** Pablo (`p.ms.08@hotmail.com`) · @SuarezPM on GitHub.
417
-
418
- ---
419
-
420
- <p align="center">
421
- <strong>APOHARA · ContextForge</strong> — built for the AMD AI Hackathon 2026<br>
422
- <em>"The pitch is the curve, not a single number."</em>
423
- </p>
424
-
425
-
426
  ---
427
  title: ContextForge Demo
428
  emoji: ⚡
429
  colorFrom: red
430
- colorTo: orange
431
  sdk: gradio
432
  sdk_version: 5.29.0
433
  app_file: demo/app.py
434
  pinned: false
435
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: ContextForge Demo
3
  emoji: ⚡
4
  colorFrom: red
5
+ colorTo: green
6
  sdk: gradio
7
  sdk_version: 5.29.0
8
  app_file: demo/app.py
9
  pinned: false
10
+ ---
11
+
12
+ # ContextForge — Shared Context Compiler for Multi-Agent LLM