---
title: ContextPrune
emoji: 🧹
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---

# ContextPrune: Adaptive Context Garbage Collection for RAG

ContextPrune is a benchmark environment designed to solve the **"Attention Dilution"** problem in Large Language Model (LLM) workflows. It treats context management as a form of **Garbage Collection**, where the system identifies, filters, and compresses information to maintain high signal-to-noise ratios in RAG pipelines.

---

## 1. System Overview

In standard RAG, retrieval often returns too much irrelevant data, causing models to "lose the signal" or hallucinate. ContextPrune provides a Reinforcement Learning (RL) environment where agents are trained to surgically manage their context window.

### Architecture Flow
```mermaid
graph TD
    A[User / Agent] -->|Execute Actions| B[FastAPI / Streamlit Interface]
    B -->|RagAction| C[ContextPrune Environment]
    C -->|Update State| D[State Machine]
    D -->|Token Budgeting| E[Context Working Set]
    D -->|Hybrid Retrieval| F[Corpus Search]
    C -->|Terminal Action| G[Deterministic Grader]
    G -->|Weighted Reward| A
```

---

## 2. Methodology: The Operational Loop

ContextPrune enforces a 5-staged workflow that mirrors enterprise incident response.

| Stage | Action | Rationale |
| :--- | :--- | :--- |
| **Triage** | `inspect_artifact` | Low-cost preview of artifact keywords and domains to filter out "Garbage" early. |
| **Analysis** | `prioritize_artifact` | Committing specific evidence to the working set. Consumes token budget. |
| **Optimization** | `summarize_artifact` | AI-driven compression. Reduces token footprint while attempting to preserve "Grounding" tokens. |
| **Resolution** | `set_resolution_plan` | Forces the agent to internalize the evidence into a logical plan before producing an output. |
| **Submission** | `submit_report` | Terminates the episode. The output must be grounded exclusively in the working set. |

---

## 3. Observation Space

The `RagObservation` provides the agent with the internal state of the incident and the current working set budget.

| Field | Type | Description |
| :--- | :--- | :--- |
| `case_id` | `str` | Unique simulated case identifier |
| `case_summary` | `str` | Real-world case context and background |
| `objective` | `str` | Specific deliverable the agent must produce |
| `workflow_stage` | `triage \| analysis \| resolution \| submitted` | Current stage in the operational loop |
| `customer_tier` | `standard \| business \| enterprise` | Customer criticality and SLA priority |
| `incident_severity` | `sev3 \| sev2 \| sev1` | Impact magnitude of the incident |
| `available_artifacts` | `List[ChunkSummary]` | Metadata for artifacts available for inspection |
| `reviewed_artifacts` | `List[str]` | IDs of artifacts already triaged |
| `prioritized_artifacts` | `List[str]` | IDs of artifacts currently in the working set |
| `plan_draft` | `Optional[str]` | Current state of the resolution plan |
| `total_tokens_used` | `int` | Current token cost of the working set |
| `token_budget` | `int` | Maximum allowed token budget |
| `step_number` | `int` | Current step index in the episode |
| `task_name` | `str` | Name of the active benchmark task |

---

## 4. Action Space

Agents interact with the environment through the following canonical actions:

| Action Type | Parameters | Effect |
| :--- | :--- | :--- |
| `inspect_artifact` | `artifact_id` | Review artifact keywords without committing to the working set |
| `prioritize_artifact` | `artifact_id` | Add a reviewed artifact to the working set (consumes tokens) |
| `summarize_artifact` | `artifact_id`, `compression_ratio` | Compress a prioritized artifact using AI summarization |
| `set_resolution_plan` | `plan` | Update the draft plan before final submission |
| `submit_report` | `answer` | Generate final response and terminate the episode |

---

## 5. Reward Engineering (The Benchmarking Grader)

The environment calculates a weighted score (0.0 - 1.0) based on 8 distinct metrics.

- **Required Coverage (24%)**: Inclusion of critical "Gold" artifacts.
- **Cross-Domain Variety (12%)**: Rewards correlation across Support, Incident logs, and Release guardrails.
- **Triage Thoroughness (12%)**: Penalizes skipping the inspection phase.
- **Planning Logic (16%)**: Alignment between the drafted plan and ground truth steps.
- **Reporting Accuracy (18%)**: Presence of mission-critical operational keywords.
- **Citation Fidelity (10%)**: Verification that claimed evidence is in the working set.
- **Token Efficiency (8%)**: Scaled bonus for minimal context usage.
- **Hallucination Penalty (-18%)**: Severe deduction for unsupported claims.

---

## 6. Scenario Benchmarks

| Task | Difficulty | Steps | Budget | Key Challenge |
| :--- | :--- | :--- | :--- | :--- |
| `refund_triage_easy` | Easy | 7 | 850 | Systematically checking policy artifacts before relief. |
| `cross_function_brief_medium` | Medium | 8 | 620 | Filtering overlapping narratives for a singular source of truth. |
| `executive_escalation_hard` | Hard | 10 | 360 | Correlating suspicious logs with release freezes on a tight budget. |

---

## 7. Configuration & Environment

### Environment Variables
| Variable | Default | Purpose |
| :--- | :--- | :--- |
| `API_BASE_URL` | `https://router.huggingface.co/v1` | OpenAI-compatible inference endpoint |
| `MODEL_NAME` | `Qwen/Qwen2.5-72B-Instruct` | Model used for baseline tasks |
| `HF_TOKEN` | *None* | Authentication for Hugging Face Inference API |
| `RAG_ENV_URL` | `http://localhost:7860` | Base URL for the ContextPrune server |

### Project Components
- **`rag_optimizer_env/`**: State machine, hybrid retrieval, and token estimation.
- **`app.py`**: FastAPI implementation for remote agent interaction.
- **`inference.py`**: Baseline agent script (OpenAI-compatible).
- **`validate.py`**: Robust validation suite for episode lifecycle verification.

---

## 🚀 Quick Start

1. **Setup**: `pip install -r requirements.txt`
2. **Server**: `python app.py` (Runs on Port 7860)
3. **Control Panel**: `streamlit run optimizer_ui.py`
4. **Validation**: `python validate.py`

---

## 🌎 Live Deployment

- **Space URL**: [huggingface.co/spaces/prithic07/context-prune](https://huggingface.co/spaces/prithic07/context-prune)
- **Direct App Link**: [prithic07-context-prune.hf.space](https://prithic07-context-prune.hf.space/)
- **Space Repo ID**: `prithic07/context-prune`

Built for Context Optimization Research.