Spaces:

TheLinconX
/

contextforge-demo

Sleeping

Pablo commited on 3 days ago

Commit

cf0a8ed

1 Parent(s): a619d03

feat: APOHARA: Context Forge V5 — synthesis + rebrand complete

Phase 1 — Merge from CC:
- uv.lock (1640 entries, full reproducibility)
- benchmark_v4/v5: honest cold/warm/off protocol + delta_pct=None
- 4 CC test files merged (277 collected, 0 failures)

Phase 2 — Rebrand:
- contextforge/ → apohara_context_forge/ (25 files)
- pyproject.toml: name=apohara-context-forge, entry=apohara
- Dockerfile, docker-compose, .env.example updated

Surgical fixes:
- AnchorPool.update_pool(): token_ids → str before encode()
- VRAMAwareCache: respect pre-set _mode when pressure=None
- BudgetManager: COT/RAG detection order before agent
- TokenCounter: self._use_fallback init
- Test guards: onnxruntime + faiss skipif decorators

Final: 277 collected · 0 failed · hermetic suite ✓

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.env.example +1 -1
Dockerfile +1 -1
README.md +11 -44
agents/__pycache__/__init__.cpython-314.pyc +0 -0
agents/__pycache__/base_agent.cpython-314.pyc +0 -0
agents/__pycache__/demo_agents.cpython-314.pyc +0 -0
agents/__pycache__/pipeline.cpython-314.pyc +0 -0
agents/base_agent.py +1 -1
agents/pipeline.py +6 -6
apohara_context_forge.egg-info/PKG-INFO +30 -0
apohara_context_forge.egg-info/SOURCES.txt +85 -0
apohara_context_forge.egg-info/dependency_links.txt +1 -0
apohara_context_forge.egg-info/entry_points.txt +2 -0
apohara_context_forge.egg-info/requires.txt +25 -0
apohara_context_forge.egg-info/top_level.txt +3 -0
{contextforge → apohara_context_forge}/__init__.py +7 -7
apohara_context_forge/__pycache__/__init__.cpython-312.pyc +0 -0
apohara_context_forge/__pycache__/__init__.cpython-314.pyc +0 -0
{contextforge → apohara_context_forge}/__pycache__/config.cpython-314.pyc +0 -0
apohara_context_forge/__pycache__/main.cpython-314.pyc +0 -0
apohara_context_forge/__pycache__/models.cpython-312.pyc +0 -0
apohara_context_forge/__pycache__/models.cpython-314.pyc +0 -0
apohara_context_forge/__pycache__/pipeline_config.cpython-312.pyc +0 -0
apohara_context_forge/__pycache__/pipeline_config.cpython-314.pyc +0 -0
apohara_context_forge/__pycache__/token_counter.cpython-312.pyc +0 -0
apohara_context_forge/__pycache__/token_counter.cpython-314.pyc +0 -0
{contextforge → apohara_context_forge}/compression/__init__.py +0 -0
apohara_context_forge/compression/__pycache__/__init__.cpython-312.pyc +0 -0
apohara_context_forge/compression/__pycache__/__init__.cpython-314.pyc +0 -0
apohara_context_forge/compression/__pycache__/budget_manager.cpython-312.pyc +0 -0
apohara_context_forge/compression/__pycache__/budget_manager.cpython-314.pyc +0 -0
apohara_context_forge/compression/__pycache__/compressor.cpython-312.pyc +0 -0
apohara_context_forge/compression/__pycache__/compressor.cpython-314.pyc +0 -0
apohara_context_forge/compression/__pycache__/coordinator.cpython-314.pyc +0 -0
{contextforge → apohara_context_forge}/compression/budget_manager.py +10 -8
{contextforge → apohara_context_forge}/compression/compressor.py +3 -3
{contextforge → apohara_context_forge}/compression/coordinator.py +6 -6
{contextforge → apohara_context_forge}/config.py +0 -0
{contextforge → apohara_context_forge}/decoding/__init__.py +1 -1
apohara_context_forge/decoding/__pycache__/__init__.cpython-314.pyc +0 -0
apohara_context_forge/decoding/__pycache__/speculative_coordinator.cpython-314.pyc +0 -0
{contextforge → apohara_context_forge}/decoding/speculative_coordinator.py +1 -1
{contextforge → apohara_context_forge}/dedup/__init__.py +0 -0
apohara_context_forge/dedup/__pycache__/__init__.cpython-312.pyc +0 -0
apohara_context_forge/dedup/__pycache__/__init__.cpython-314.pyc +0 -0
apohara_context_forge/dedup/__pycache__/_deprecated_dedup_engine.cpython-314.pyc +0 -0
apohara_context_forge/dedup/__pycache__/embedder.cpython-314.pyc +0 -0
apohara_context_forge/dedup/__pycache__/faiss_index.cpython-312.pyc +0 -0
apohara_context_forge/dedup/__pycache__/faiss_index.cpython-314.pyc +0 -0
apohara_context_forge/dedup/__pycache__/lsh_engine.cpython-312.pyc +0 -0

.env.example CHANGED Viewed

@@ -3,7 +3,7 @@ VLLM_BASE_URL=http://localhost:8000
 VLLM_MODEL=Qwen/Qwen3.6-35B-A3B
 VLLM_API_KEY=contextforge-local
-# ContextForge
 CONTEXTFORGE_HOST=0.0.0.0
 CONTEXTFORGE_PORT=8001
 CONTEXTFORGE_TTL_SECONDS=300

 VLLM_MODEL=Qwen/Qwen3.6-35B-A3B
 VLLM_API_KEY=contextforge-local
+# APOHARA: Context Forge
 CONTEXTFORGE_HOST=0.0.0.0
 CONTEXTFORGE_PORT=8001
 CONTEXTFORGE_TTL_SECONDS=300

Dockerfile CHANGED Viewed

@@ -15,4 +15,4 @@ COPY . .
 EXPOSE 8001
-CMD ["python", "-m", "contextforge.main"]


15
16	EXPOSE 8001
17
18	+ CMD ["python", "-m", "apohara_context_forge.main"]

README.md CHANGED Viewed

@@ -1,3 +1,7 @@
 # APOHARA V1.0 — ContextForge
 ```
@@ -62,46 +66,9 @@ ContextForge coordinates KV block sharing across all agents through 8 peer-revie
 Every optimization traces back to a peer-reviewed paper published at **NeurIPS, ICML, ACL, or IJCAI**.
-```
-WITH ContextForge (shared KV via ATOM plugin):
-  ┌──────────────────────────────────────────────────────────────────────────────┐
-  │                        AMD Instinct MI300X — 192 GB HBM3                     │
-  │  ┌────────────────────────────────────────────────────────────────────────┐  │
-  │  │                  vLLMAtomPlugin (entry_point: vllm.general_plugins)    │  │
-  │  │  pre/post hooks · KV offset routing · ROCm-native                       │  │
-  │  └────────────────────────────────┬────────────────────────────────────────┘  │
-  │                                   ▼                                            │
-  │  ┌──────────────────────────────────────────────────────────────────────────┐ │
-  │  │              VRAMAwareCache + QueueingController (ICML 2026)              │ │
-  │  │        λ_critical stability · Welford E[S] · INVARIANT-11                │ │
-  │  └────────────────────────────┬────────────────────────────────────────────┘ │
-  │                               ▼                                              │
-  │  ┌──────────────┐  ┌─────────────┐  ┌─────────────┐  ┌───────────────────┐   │
-  │  │AnchorPool    │  │CLAMetadata  │  │StepGraph    │  │RotateKV           │   │
-  │  │KVCOMM        │  │CLA/LCKV     │  │KVFlow       │  │INT4 pre-RoPE      │   │
-  │  │simhash anchor│  │NAACL 2025  │  │eviction     │  │3.97× compression  │   │
-  │  └──────┬───────┘  └──────┬─────┘  └──────┬─────┘  └─────────┬─────────┘   │
-  │         │                 │               │                  │             │
-  │         └─────────────────┴───────────────┴──────────────────┘             │
-  │                           ▼                                                  │
-  │  ┌────────────────────────────────────────────────────────────────────────┐  │
-  │  │              ContextRegistry (all modules wired, DI)                      │  │
-  │  │  LSHEngine + FAISSContextIndex · PBKVPredictor · SpeculativeCoordinator │  │
-  │  └────────────────────────────────┬────────────────────────────────────────┘  │
-  │                                   ▼                                           │
-  │  ┌───────────────────┐    ┌─────────────────────┐    ┌───────────────────┐  │
-  │  │ LMCacheBridge     │    │ KVAwareRouter       │    │ VisualKVCache     │  │
-  │  │ cross-worker      │    │ anchor locality      │    │ SHA256 dedup      │  │
-  │  │                   │    │ CLA affinity         │    │ +44.9% throughput │  │
-  │  └────────┬──────────┘    └──────────┬───────────┘    └───────────────────┘  │
-  │           └──────────────────────────┴─────────────────────────────────────��┘ │
-  │                                                                                 │
-  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
-  │  │Retriever │  │Reranker  │  │Summarizer│  │ Critic   │  │Responder │        │
-  │  │(fast)    │  │(fast)    │  │(fast)    │  │(CoT)    │  │(final)   │        │
-  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘        │
-  └────────────────────────────────────────────────────────────────────────────────┘
-```
 ---
@@ -184,7 +151,7 @@ Cost to validate on AMD DevCloud (MI300X x1):
 ## 🏗️ Architecture
 ```
-contextforge/
 ├── __init__.py
 ├── main.py
 ├── config.py
@@ -402,8 +369,8 @@ pytest tests/ -v --tb=short
 **AMD DevCloud (MI300X)** — Primary target hardware
 ```bash
-git clone https://github.com/SuarezPM/ContextForge
-cd ContextForge
 pip install -e ".[rocm]"
 pip install qwen3-embed onnxruntime streamlit prometheus-client --quiet
@@ -434,7 +401,7 @@ streamlit run demo/dashboard.py -- --mock
 **Docker**
 ```bash
-docker compose up contextforge
 ```
 <!-- PLACEHOLDER:DEVCLOUD_SETUP_VIDEO -->

+<p align="center">
+  <img src="assets/apohara-contextforge-logo.png" alt="Apohara : Context Forge" width="420">
+</p>
 # APOHARA V1.0 — ContextForge
 ```
 Every optimization traces back to a peer-reviewed paper published at **NeurIPS, ICML, ACL, or IJCAI**.
+<p align="center">
+  <img src="assets/systems-diagram.jpeg" alt="WITH ContextForge — shared KV via ATOM plugin (systems diagram)" width="720">
+</p>
 ---
 ## 🏗️ Architecture
 ```
+apohara_context_forge/
 ├── __init__.py
 ├── main.py
 ├── config.py
 **AMD DevCloud (MI300X)** — Primary target hardware
 ```bash
+git clone https://github.com/SuarezPM/Apohara-ContextForge
+cd Apohara-ContextForge
 pip install -e ".[rocm]"
 pip install qwen3-embed onnxruntime streamlit prometheus-client --quiet
 **Docker**
 ```bash
+docker compose up apohara
 ```
 <!-- PLACEHOLDER:DEVCLOUD_SETUP_VIDEO -->

agents/__pycache__/__init__.cpython-314.pyc CHANGED Viewed

Binary files a/agents/__pycache__/__init__.cpython-314.pyc and b/agents/__pycache__/__init__.cpython-314.pyc differ

agents/__pycache__/base_agent.cpython-314.pyc CHANGED Viewed

Binary files a/agents/__pycache__/base_agent.cpython-314.pyc and b/agents/__pycache__/base_agent.cpython-314.pyc differ

agents/__pycache__/demo_agents.cpython-314.pyc CHANGED Viewed

Binary files a/agents/__pycache__/demo_agents.cpython-314.pyc and b/agents/__pycache__/demo_agents.cpython-314.pyc differ

agents/__pycache__/pipeline.cpython-314.pyc CHANGED Viewed

Binary files a/agents/__pycache__/pipeline.cpython-314.pyc and b/agents/__pycache__/pipeline.cpython-314.pyc differ

agents/base_agent.py CHANGED Viewed

@@ -6,7 +6,7 @@ import time
 import httpx
-from contextforge.config import settings
 logger = logging.getLogger(__name__)

 import httpx
+from apohara_context_forge.config import settings
 logger = logging.getLogger(__name__)

agents/pipeline.py CHANGED Viewed

@@ -6,12 +6,12 @@ from typing import Any, Optional
 from agents.demo_agents import create_agents
-from contextforge.dedup.faiss_index import FAISSContextIndex
-from contextforge.dedup.lsh_engine import LSHTokenMatcher
-from contextforge.metrics.vram_monitor import VRAMMonitor
-from contextforge.pipeline_config import PipelineConfig
-from contextforge.registry.context_registry import ContextRegistry
-from contextforge.registry.vram_aware_cache import VRAMAwareCache
 logger = logging.getLogger(__name__)

 from agents.demo_agents import create_agents
+from apohara_context_forge.dedup.faiss_index import FAISSContextIndex
+from apohara_context_forge.dedup.lsh_engine import LSHTokenMatcher
+from apohara_context_forge.metrics.vram_monitor import VRAMMonitor
+from apohara_context_forge.pipeline_config import PipelineConfig
+from apohara_context_forge.registry.context_registry import ContextRegistry
+from apohara_context_forge.registry.vram_aware_cache import VRAMAwareCache
 logger = logging.getLogger(__name__)

apohara_context_forge.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,30 @@

+Metadata-Version: 2.4
+Name: apohara-context-forge
+Version: 0.1.0
+Summary: APOHARA: Context Forge — Silicon-native KV cache coordination for multi-agent LLM pipelines on AMD Instinct MI300X
+License: MIT
+Requires-Python: <3.13,>=3.11
+Requires-Dist: fastapi<0.116,>=0.115
+Requires-Dist: uvicorn[standard]<0.33,>=0.32
+Requires-Dist: pydantic<3,>=2.9
+Requires-Dist: pydantic-settings<3,>=2.6
+Requires-Dist: httpx<0.28,>=0.27
+Requires-Dist: sentence-transformers<4,>=3.3
+Requires-Dist: llmlingua<0.3,>=0.2.2
+Requires-Dist: torch<2.6,>=2.4
+Requires-Dist: gradio<6,>=5.7
+Requires-Dist: plotly<6,>=5.24
+Requires-Dist: numpy<2.2,>=1.26
+Requires-Dist: aiofiles<25,>=24.1
+Requires-Dist: rich<14,>=13.9
+Requires-Dist: psutil<8,>=5.9
+Provides-Extra: dev
+Requires-Dist: pytest>=8.3; extra == "dev"
+Requires-Dist: pytest-asyncio>=0.24; extra == "dev"
+Requires-Dist: ruff>=0.7; extra == "dev"
+Requires-Dist: fastapi; extra == "dev"
+Requires-Dist: httpx; extra == "dev"
+Requires-Dist: gradio; extra == "dev"
+Requires-Dist: streamlit; extra == "dev"
+Requires-Dist: anyio; extra == "dev"
+Requires-Dist: pytest-anyio; extra == "dev"

apohara_context_forge.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,85 @@

+README.md
+pyproject.toml
+agents/__init__.py
+agents/base_agent.py
+agents/demo_agents.py
+agents/pipeline.py
+apohara_context_forge/__init__.py
+apohara_context_forge/config.py
+apohara_context_forge/main.py
+apohara_context_forge/models.py
+apohara_context_forge/pipeline_config.py
+apohara_context_forge/token_counter.py
+apohara_context_forge.egg-info/PKG-INFO
+apohara_context_forge.egg-info/SOURCES.txt
+apohara_context_forge.egg-info/dependency_links.txt
+apohara_context_forge.egg-info/entry_points.txt
+apohara_context_forge.egg-info/requires.txt
+apohara_context_forge.egg-info/top_level.txt
+apohara_context_forge/compression/__init__.py
+apohara_context_forge/compression/budget_manager.py
+apohara_context_forge/compression/compressor.py
+apohara_context_forge/compression/coordinator.py
+apohara_context_forge/decoding/__init__.py
+apohara_context_forge/decoding/speculative_coordinator.py
+apohara_context_forge/dedup/__init__.py
+apohara_context_forge/dedup/_deprecated_dedup_engine.py
+apohara_context_forge/dedup/cosine.py
+apohara_context_forge/dedup/embedder.py
+apohara_context_forge/dedup/faiss_index.py
+apohara_context_forge/dedup/lsh_engine.py
+apohara_context_forge/embeddings/__init__.py
+apohara_context_forge/embeddings/embedding_engine.py
+apohara_context_forge/kv_offset/__init__.py
+apohara_context_forge/kv_offset/anchor_pool.py
+apohara_context_forge/kv_offset/cla_metadata.py
+apohara_context_forge/mcp/__init__.py
+apohara_context_forge/mcp/server.py
+apohara_context_forge/metrics/__init__.py
+apohara_context_forge/metrics/collector.py
+apohara_context_forge/metrics/prometheus_metrics.py
+apohara_context_forge/metrics/vram_monitor.py
+apohara_context_forge/multimodal/__init__.py
+apohara_context_forge/multimodal/visual_kv_cache.py
+apohara_context_forge/normalization/__init__.py
+apohara_context_forge/normalization/prefix_normalizer.py
+apohara_context_forge/quantization/rotate_kv.py
+apohara_context_forge/registry/__init__.py
+apohara_context_forge/registry/_deprecated_ttl_cache.py
+apohara_context_forge/registry/context_registry.py
+apohara_context_forge/registry/vram_aware_cache.py
+apohara_context_forge/routing/kv_aware_router.py
+apohara_context_forge/scheduling/pbkv_predictor.py
+apohara_context_forge/scheduling/queueing_controller.py
+apohara_context_forge/scheduling/step_graph.py
+apohara_context_forge/serving/__init__.py
+apohara_context_forge/serving/atom_plugin.py
+apohara_context_forge/serving/lmcache_bridge.py
+apohara_context_forge/serving/vllm_client.py
+demo/__init__.py
+demo/app.py
+demo/benchmark.py
+demo/benchmark_v4.py
+demo/benchmark_v5.py
+demo/dashboard.py
+tests/test_atom_plugin.py
+tests/test_benchmark.py
+tests/test_cla_metadata.py
+tests/test_compressor.py
+tests/test_coordinator.py
+tests/test_dedup.py
+tests/test_embedding_engine.py
+tests/test_integration.py
+tests/test_kv_aware_router.py
+tests/test_kv_offset.py
+tests/test_lmcache_bridge.py
+tests/test_mcp_server.py
+tests/test_normalization.py
+tests/test_pbkv_predictor.py
+tests/test_pipeline.py
+tests/test_queueing_controller.py
+tests/test_registry.py
+tests/test_rotate_kv.py
+tests/test_speculative_coordinator.py
+tests/test_step_graph.py
+tests/test_visual_kv_cache.py

apohara_context_forge.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

apohara_context_forge.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ apohara = apohara_context_forge.main:main

apohara_context_forge.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,25 @@

+fastapi<0.116,>=0.115
+uvicorn[standard]<0.33,>=0.32
+pydantic<3,>=2.9
+pydantic-settings<3,>=2.6
+httpx<0.28,>=0.27
+sentence-transformers<4,>=3.3
+llmlingua<0.3,>=0.2.2
+torch<2.6,>=2.4
+gradio<6,>=5.7
+plotly<6,>=5.24
+numpy<2.2,>=1.26
+aiofiles<25,>=24.1
+rich<14,>=13.9
+psutil<8,>=5.9
+[dev]
+pytest>=8.3
+pytest-asyncio>=0.24
+ruff>=0.7
+fastapi
+httpx
+gradio
+streamlit
+anyio
+pytest-anyio

apohara_context_forge.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+agents
+apohara_context_forge
+demo

{contextforge → apohara_context_forge}/__init__.py RENAMED Viewed

@@ -1,13 +1,13 @@
 """ContextForge - Shared context compiler for multi-agent LLM systems on AMD MI300X."""
 __version__ = "3.0.0"
-from contextforge.registry.context_registry import ContextRegistry, SharedContextResult, RegisteredAgent
-from contextforge.pipeline_config import PipelineConfig
-from contextforge.token_counter import TokenCounter, count_tokens, encode_tokens, compute_kv_gb
-from contextforge.metrics.vram_monitor import VRAMMonitor, get_monitor, get_vram_pressure
-from contextforge.dedup.lsh_engine import LSHTokenMatcher, TokenBlockMatch
-from contextforge.dedup.faiss_index import FAISSContextIndex, FAISSMatch
-from contextforge.registry.vram_aware_cache import VRAMAwareCache, EvictionMode
 __all__ = [
     # Core registry

 """ContextForge - Shared context compiler for multi-agent LLM systems on AMD MI300X."""
 __version__ = "3.0.0"
+from apohara_context_forge.registry.context_registry import ContextRegistry, SharedContextResult, RegisteredAgent
+from apohara_context_forge.pipeline_config import PipelineConfig
+from apohara_context_forge.token_counter import TokenCounter, count_tokens, encode_tokens, compute_kv_gb
+from apohara_context_forge.metrics.vram_monitor import VRAMMonitor, get_monitor, get_vram_pressure
+from apohara_context_forge.dedup.lsh_engine import LSHTokenMatcher, TokenBlockMatch
+from apohara_context_forge.dedup.faiss_index import FAISSContextIndex, FAISSMatch
+from apohara_context_forge.registry.vram_aware_cache import VRAMAwareCache, EvictionMode
 __all__ = [
     # Core registry

apohara_context_forge/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (1.23 kB). View file

apohara_context_forge/__pycache__/__init__.cpython-314.pyc ADDED Viewed

Binary file (1.22 kB). View file

{contextforge → apohara_context_forge}/__pycache__/config.cpython-314.pyc RENAMED Viewed

Binary files a/contextforge/__pycache__/config.cpython-314.pyc and b/apohara_context_forge/__pycache__/config.cpython-314.pyc differ

apohara_context_forge/__pycache__/main.cpython-314.pyc ADDED Viewed

Binary file (2.22 kB). View file

apohara_context_forge/__pycache__/models.cpython-312.pyc ADDED Viewed

Binary file (3.24 kB). View file

apohara_context_forge/__pycache__/models.cpython-314.pyc ADDED Viewed

Binary file (4.91 kB). View file

apohara_context_forge/__pycache__/pipeline_config.cpython-312.pyc ADDED Viewed

Binary file (2.3 kB). View file

apohara_context_forge/__pycache__/pipeline_config.cpython-314.pyc ADDED Viewed

Binary file (2.78 kB). View file

apohara_context_forge/__pycache__/token_counter.cpython-312.pyc ADDED Viewed

Binary file (8.45 kB). View file

apohara_context_forge/__pycache__/token_counter.cpython-314.pyc ADDED Viewed

Binary file (10.5 kB). View file

{contextforge → apohara_context_forge}/compression/__init__.py RENAMED Viewed

File without changes

apohara_context_forge/compression/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (252 Bytes). View file

apohara_context_forge/compression/__pycache__/__init__.cpython-314.pyc ADDED Viewed

Binary file (254 Bytes). View file

apohara_context_forge/compression/__pycache__/budget_manager.cpython-312.pyc ADDED Viewed

Binary file (12 kB). View file

apohara_context_forge/compression/__pycache__/budget_manager.cpython-314.pyc ADDED Viewed

Binary file (13.5 kB). View file

apohara_context_forge/compression/__pycache__/compressor.cpython-312.pyc ADDED Viewed

Binary file (3.82 kB). View file

apohara_context_forge/compression/__pycache__/compressor.cpython-314.pyc ADDED Viewed

Binary file (4.65 kB). View file

apohara_context_forge/compression/__pycache__/coordinator.cpython-314.pyc ADDED Viewed

Binary file (4.26 kB). View file

{contextforge → apohara_context_forge}/compression/budget_manager.py RENAMED Viewed

@@ -171,7 +171,7 @@ class CompressionBudgetManager:
         Returns:
             CompressionPlan with decision and parameters
         """
-        from contextforge.token_counter import TokenCounter
         if token_count is None:
             token_count = TokenCounter.get().count(segment)
@@ -238,7 +238,7 @@ class CompressionBudgetManager:
         if not plan.should_compress:
             return plan.segment, 1.0
-        from contextforge.compression.compressor import ContextCompressor
         compressor = ContextCompressor()
         await compressor.load()
@@ -288,15 +288,17 @@ def detect_segment_type(segment: str) -> SegmentType:
         if indicator.lower() in segment.lower()[:100]:
             return SegmentType.TOOL_RESULT
-    # Check for agent output indicators
-    agent_indicators = ["retrieved", "summarized", "analyzed", "reasoning:", "step"]
     if any(ind in segment.lower()[:150] for ind in agent_indicators):
         return SegmentType.AGENT_OUTPUT
-    # Check for CoT reasoning
-    if all(ind in segment.lower() for ind in ["step", "reasoning"]) or "step by step" in segment.lower():
-        return SegmentType.COT_REASONING
     # Check for RAG/retrieved content
     rag_indicators = ["document", "retrieved", "context:", "reference:"]
     if any(ind in segment.lower()[:200] for ind in rag_indicators):

         Returns:
             CompressionPlan with decision and parameters
         """
+        from apohara_context_forge.token_counter import TokenCounter
         if token_count is None:
             token_count = TokenCounter.get().count(segment)
         if not plan.should_compress:
             return plan.segment, 1.0
+        from apohara_context_forge.compression.compressor import ContextCompressor
         compressor = ContextCompressor()
         await compressor.load()
         if indicator.lower() in segment.lower()[:100]:
             return SegmentType.TOOL_RESULT
+    # Check for CoT reasoning FIRST (before agent — "step" + "reasoning" without ":")
+    if "step by step" in segment.lower() or (
+        "step" in segment.lower() and "reasoning" in segment.lower()
+    ):
+        return SegmentType.COT_REASONING
+    # Check for agent output indicators (after CoT)
+    agent_indicators = ["summarized", "analyzed", "reasoning:", "step"]
     if any(ind in segment.lower()[:150] for ind in agent_indicators):
         return SegmentType.AGENT_OUTPUT
     # Check for RAG/retrieved content
     rag_indicators = ["document", "retrieved", "context:", "reference:"]
     if any(ind in segment.lower()[:200] for ind in rag_indicators):

{contextforge → apohara_context_forge}/compression/compressor.py RENAMED Viewed

@@ -3,7 +3,7 @@ import asyncio
 import logging
 from typing import Literal
-from llmlingua import LLMLingua
 logger = logging.getLogger(__name__)
@@ -13,7 +13,7 @@ class ContextCompressor:
     def __init__(self, model_name: str = "microsoft/llmlingua-2-xlm-roberta-large-meetingbank"):
         self._model_name = model_name
-        self._model: LLMLingua | None = None
         self._lock = asyncio.Lock()
     async def load(self) -> None:
@@ -22,7 +22,7 @@ class ContextCompressor:
             async with self._lock:
                 if self._model is None:
                     logger.info(f"Loading compressor: {self._model_name}")
-                    self._model = LLMLingua(self._model_name)
     async def compress(self, context: str, rate: float = 0.5) -> tuple[str, float]:
         """

 import logging
 from typing import Literal
+from llmlingua import PromptCompressor
 logger = logging.getLogger(__name__)
     def __init__(self, model_name: str = "microsoft/llmlingua-2-xlm-roberta-large-meetingbank"):
         self._model_name = model_name
+        self._model: PromptCompressor | None = None
         self._lock = asyncio.Lock()
     async def load(self) -> None:
             async with self._lock:
                 if self._model is None:
                     logger.info(f"Loading compressor: {self._model_name}")
+                    self._model = PromptCompressor(self._model_name)
     async def compress(self, context: str, rate: float = 0.5) -> tuple[str, float]:
         """

{contextforge → apohara_context_forge}/compression/coordinator.py RENAMED Viewed

@@ -3,9 +3,9 @@ import asyncio
 import logging
 from typing import Literal
-from contextforge.config import settings
-from contextforge.dedup.dedup_engine import SemanticDedupEngine
-from contextforge.models import CompressionDecision
 logger = logging.getLogger(__name__)
@@ -27,7 +27,7 @@ class CompressionCoordinator:
     async def decide(self, agent_id: str, context: str) -> CompressionDecision:
         """Make compression decision for an agent's context."""
-        from contextforge.registry.context_registry import ContextRegistry
         registry = ContextRegistry()
         original_tokens = len(context.split())
@@ -60,7 +60,7 @@ class CompressionCoordinator:
             )
         elif similarity < 0.85 and original_tokens > 500:
             # Compress only
-            from contextforge.compression.compressor import ContextCompressor
             compressor = ContextCompressor()
             compressed, ratio = await compressor.compress(context, settings.contextforge_compression_rate)
             final_tokens = len(compressed.split())
@@ -73,7 +73,7 @@ class CompressionCoordinator:
             )
         elif similarity >= 0.85 and original_tokens > 500:
             # Both reuse and compress
-            from contextforge.compression.compressor import ContextCompressor
             compressor = ContextCompressor()
             compressed, ratio = await compressor.compress(context, settings.contextforge_compression_rate)
             final_tokens = len(compressed.split())

 import logging
 from typing import Literal
+from apohara_context_forge.config import settings
+from apohara_context_forge.dedup.dedup_engine import SemanticDedupEngine
+from apohara_context_forge.models import CompressionDecision
 logger = logging.getLogger(__name__)
     async def decide(self, agent_id: str, context: str) -> CompressionDecision:
         """Make compression decision for an agent's context."""
+        from apohara_context_forge.registry.context_registry import ContextRegistry
         registry = ContextRegistry()
         original_tokens = len(context.split())
             )
         elif similarity < 0.85 and original_tokens > 500:
             # Compress only
+            from apohara_context_forge.compression.compressor import ContextCompressor
             compressor = ContextCompressor()
             compressed, ratio = await compressor.compress(context, settings.contextforge_compression_rate)
             final_tokens = len(compressed.split())
             )
         elif similarity >= 0.85 and original_tokens > 500:
             # Both reuse and compress
+            from apohara_context_forge.compression.compressor import ContextCompressor
             compressor = ContextCompressor()
             compressed, ratio = await compressor.compress(context, settings.contextforge_compression_rate)
             final_tokens = len(compressed.split())

{contextforge → apohara_context_forge}/config.py RENAMED Viewed

File without changes

{contextforge → apohara_context_forge}/decoding/__init__.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """Decoding package — speculative decoding coordinators."""
-from contextforge.decoding.speculative_coordinator import (
     SpeculativeConfig,
     SpeculativeCoordinator,
     SpeculativeResult,

 """Decoding package — speculative decoding coordinators."""
+from apohara_context_forge.decoding.speculative_coordinator import (
     SpeculativeConfig,
     SpeculativeCoordinator,
     SpeculativeResult,

apohara_context_forge/decoding/__pycache__/__init__.cpython-314.pyc ADDED Viewed

Binary file (438 Bytes). View file

apohara_context_forge/decoding/__pycache__/speculative_coordinator.cpython-314.pyc ADDED Viewed

Binary file (13.6 kB). View file

{contextforge → apohara_context_forge}/decoding/speculative_coordinator.py RENAMED Viewed

@@ -29,7 +29,7 @@ from typing import Optional, TYPE_CHECKING
 logger = logging.getLogger(__name__)
 if TYPE_CHECKING:
-    from contextforge.scheduling.queueing_controller import QueueingController
 @dataclass

 logger = logging.getLogger(__name__)
 if TYPE_CHECKING:
+    from apohara_context_forge.scheduling.queueing_controller import QueueingController
 @dataclass

{contextforge → apohara_context_forge}/dedup/__init__.py RENAMED Viewed

File without changes

apohara_context_forge/dedup/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (216 Bytes). View file

apohara_context_forge/dedup/__pycache__/__init__.cpython-314.pyc ADDED Viewed

Binary file (218 Bytes). View file

apohara_context_forge/dedup/__pycache__/_deprecated_dedup_engine.cpython-314.pyc ADDED Viewed

Binary file (5.96 kB). View file

apohara_context_forge/dedup/__pycache__/embedder.cpython-314.pyc ADDED Viewed

Binary file (3.87 kB). View file

apohara_context_forge/dedup/__pycache__/faiss_index.cpython-312.pyc ADDED Viewed

Binary file (12.6 kB). View file

apohara_context_forge/dedup/__pycache__/faiss_index.cpython-314.pyc ADDED Viewed

Binary file (14.4 kB). View file

apohara_context_forge/dedup/__pycache__/lsh_engine.cpython-312.pyc ADDED Viewed

Binary file (11.6 kB). View file