--- title: ContextPrune emoji: ๐Ÿงน colorFrom: blue colorTo: indigo sdk: docker pinned: false --- # ContextPrune: Adaptive Context Garbage Collection for RAG ContextPrune is a benchmark environment designed to solve the **"Attention Dilution"** problem in Large Language Model (LLM) workflows. It treats context management as a form of **Garbage Collection**, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines. --- ## 1. System Overview In standard RAG, retrieval often returns too much irrelevant data, causing models to "lose the signal" or hallucinate. ContextPrune provides a Reinforcement Learning (RL) environment where agents are trained to surgically manage their context window. ### Architecture Flow ```mermaid graph TD A[User / Agent] -->|Execute Actions| B[FastAPI / Streamlit Interface] B -->|RagAction| C[ContextPrune Environment] C -->|Update State| D[State Machine] D -->|Token Budgeting| E[Context Working Set] D -->|Hybrid Retrieval| F[Corpus Search] C -->|Terminal Action| G[Deterministic Grader] G -->|Weighted Reward| A ``` --- ## 2. Methodology: The Operational Loop ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response. | Stage | Action | Rationale | | :--- | :--- | :--- | | **Triage** | `inspect_artifact` | Low-cost preview of artifact keywords and domains to filter out "Garbage" early. | | **Analysis** | `prioritize_artifact` | Committing specific evidence to the working set. Consumes token budget. | | **Optimization** | `summarize_artifact` | AI-driven compression. Reduces token footprint while attempting to preserve "Grounding" tokens. | | **Resolution** | `set_resolution_plan` | Forces the agent to internalize the evidence into a logical plan before producing an output. | | **Submission** | `submit_report` | Terminates the episode. The output must be grounded exclusively in the working set. | --- ## 3. Observation Space The `RagObservation` provides the agent with the internal state of the incident and the current working set budget. | Field | Type | Description | | :--- | :--- | :--- | | `case_id` | `str` | Unique simulated case identifier | | `case_summary` | `str` | Real-world case context and background | | `objective` | `str` | Specific deliverable the agent must produce | | `workflow_stage` | `triage \| analysis \| resolution \| submitted` | Current stage in the operational loop | | `customer_tier` | `standard \| business \| enterprise` | Customer criticality and SLA priority | | `incident_severity` | `sev3 \| sev2 \| sev1` | Impact magnitude of the incident | | `available_artifacts` | `List[ChunkSummary]` | Metadata for artifacts available for inspection | | `reviewed_artifacts` | `List[str]` | IDs of artifacts already triaged | | `prioritized_artifacts` | `List[str]` | IDs of artifacts currently in the working set | | `plan_draft` | `Optional[str]` | Current state of the resolution plan | | `total_tokens_used` | `int` | Current token cost of the working set | | `token_budget` | `int` | Maximum allowed token budget | | `step_number` | `int` | Current step index in the episode | | `task_name` | `str` | Name of the active benchmark task | --- ## 4. Action Space Agents interact with the environment through the following canonical actions: | Action Type | Parameters | Effect | | :--- | :--- | :--- | | `inspect_artifact` | `artifact_id` | Review artifact keywords without committing to the working set | | `prioritize_artifact` | `artifact_id` | Add a reviewed artifact to the working set (consumes tokens) | | `summarize_artifact` | `artifact_id`, `compression_ratio` | Compress a prioritized artifact using AI summarization | | `set_resolution_plan` | `plan` | Update the draft plan before final submission | | `submit_report` | `answer` | Generate final response and terminate the episode | --- ## 5. Reward Engineering (The Benchmarking Grader) The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics. - **Required Coverage (24%)**: Inclusion of critical "Gold" artifacts. - **Cross-Domain Variety (12%)**: Rewards correlation across Support, Incident logs, and Release guardrails. - **Triage Thoroughness (12%)**: Penalizes skipping the inspection phase. - **Planning Logic (16%)**: Alignment between the drafted plan and ground truth steps. - **Reporting Accuracy (18%)**: Presence of mission-critical operational keywords. - **Citation Fidelity (10%)**: Verification that claimed evidence is in the working set. - **Token Efficiency (8%)**: Scaled bonus for minimal context usage. - **Hallucination Penalty (-18%)**: Severe deduction for unsupported claims. --- ## 6. Scenario Benchmarks | Task | Difficulty | Steps | Budget | Key Challenge | | :--- | :--- | :--- | :--- | :--- | | `refund_triage_easy` | Easy | 7 | 850 | Systematically checking policy artifacts before relief. | | `cross_function_brief_medium` | Medium | 8 | 620 | Filtering overlapping narratives for a singular source of truth. | | `executive_escalation_hard` | Hard | 10 | 360 | Correlating suspicious logs with release freezes on a tight budget. | --- ## 7. Configuration & Environment ### Environment Variables | Variable | Default | Purpose | | :--- | :--- | :--- | | `API_BASE_URL` | `https://router.huggingface.co/v1` | OpenAI-compatible inference endpoint | | `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | Model used for baseline tasks | | `HF_TOKEN` | *None* | Authentication for Hugging Face Inference API | | `RAG_ENV_URL` | `http://localhost:7860` | Base URL for the ContextPrune server | ### Project Components - **`rag_optimizer_env/`**: State machine, hybrid retrieval, and token estimation. - **`app.py`**: FastAPI implementation for remote agent interaction. - **`inference.py`**: Baseline agent script (OpenAI-compatible). - **`validate.py`**: Robust validation suite for episode lifecycle verification. --- ## ๐Ÿš€ Quick Start 1. **Setup**: `pip install -r requirements.txt` 2. **Server**: `python app.py` (Runs on Port 7860) 3. **Control Panel**: `streamlit run optimizer_ui.py` 4. **Validation**: `python validate.py` --- ## ๐ŸŒŽ Live Deployment - **Space URL**: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune) - **Direct App Link**: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/) - **Space Repo ID**: `prithic07/context-prune` Built for Context Optimization Research.