Add ECC Harness: phd_research_os/AGENTS.md
Browse files- phd_research_os/AGENTS.md +102 -0
phd_research_os/AGENTS.md
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# PhD Research OS β Agent Registry & Contracts
|
| 2 |
+
|
| 3 |
+
> **WAKE-UP INSTRUCTION**: This file defines every agent role, its contract,
|
| 4 |
+
> its boundaries, and how companion agents relate to the core brain.
|
| 5 |
+
|
| 6 |
+
## Agent Hierarchy
|
| 7 |
+
|
| 8 |
+
```
|
| 9 |
+
Human Researcher (Provenance Level 1)
|
| 10 |
+
β
|
| 11 |
+
βββ Research OS Brain (Level 5 β core agents.py)
|
| 12 |
+
β βββ Researcher Agent β claim extraction
|
| 13 |
+
β βββ Epistemic Classifier β Fact/Interpretation/Hypothesis/Conflict
|
| 14 |
+
β βββ Confidence Scorer β formula-based scoring
|
| 15 |
+
β βββ Verifier Agent β contradiction detection
|
| 16 |
+
β βββ Query Planner β question decomposition
|
| 17 |
+
β βββ Decision Generator β research action proposals
|
| 18 |
+
β
|
| 19 |
+
βββ Companion Agents (Level 5 β spawned via agent_os.py)
|
| 20 |
+
βββ DataQualityAuditor β audit extraction quality, flag drift
|
| 21 |
+
βββ PromptOptimizer β improve system prompts via A/B testing
|
| 22 |
+
βββ DomainExpander β generate training data for new STEM fields
|
| 23 |
+
βββ CalibrationAnalyst β analyze Brier scores, recommend adjustments
|
| 24 |
+
βββ CitationChaser β find papers that cite/contradict current claims
|
| 25 |
+
βββ [Custom] β user-defined agents via factory
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
## Core Agent Contracts (agents.py)
|
| 29 |
+
|
| 30 |
+
### Researcher Agent
|
| 31 |
+
- **Input**: Raw scientific text (1 page)
|
| 32 |
+
- **Output**: `{"claims": [ClaimObject, ...]}`
|
| 33 |
+
- **Constraint**: Epistemic tags conservative. Prefer "Interpretation" when uncertain.
|
| 34 |
+
- **Provenance**: Level 5. Claims must be human-reviewed before canonical status.
|
| 35 |
+
|
| 36 |
+
### Epistemic Classifier
|
| 37 |
+
- **Input**: Single scientific statement
|
| 38 |
+
- **Output**: `{"epistemic_tag": str, "reasoning": str, "confidence_in_classification": float}`
|
| 39 |
+
- **Constraint**: 4-class only. No intermediate tags.
|
| 40 |
+
|
| 41 |
+
### Confidence Scorer
|
| 42 |
+
- **Input**: Claim text + journal + study type + tier
|
| 43 |
+
- **Output**: `{"confidence": float, ...factor_breakdown...}`
|
| 44 |
+
- **Constraint**: MUST use fixed-point formula. No free-form scoring.
|
| 45 |
+
|
| 46 |
+
### Verifier Agent
|
| 47 |
+
- **Input**: Claim pair (A, B)
|
| 48 |
+
- **Output**: `{"conflict_detected": bool, "conflict_type": str, "hypothesis_confidence": "low", ...}`
|
| 49 |
+
- **INVARIANT**: `hypothesis_confidence` is ALWAYS `"low"`. Hardcoded. Never changes.
|
| 50 |
+
|
| 51 |
+
### Query Planner
|
| 52 |
+
- **Input**: Broad research question
|
| 53 |
+
- **Output**: `{"sub_queries": [str, ...], "reasoning": str}`
|
| 54 |
+
- **Constraint**: 2β4 sub-queries. Each independently searchable.
|
| 55 |
+
|
| 56 |
+
### Decision Generator
|
| 57 |
+
- **Input**: Goal + gaps + low-confidence claims
|
| 58 |
+
- **Output**: DecisionObject with information gain
|
| 59 |
+
- **Constraint**: `expected_information_gain = uncertainty Γ impact`
|
| 60 |
+
|
| 61 |
+
## Companion Agent Contract (agent_os.py)
|
| 62 |
+
|
| 63 |
+
Every companion agent MUST:
|
| 64 |
+
|
| 65 |
+
1. **Declare its purpose** at spawn time (immutable after creation)
|
| 66 |
+
2. **Operate within boundaries** β cannot directly modify claims, sources, or goals
|
| 67 |
+
3. **Produce proposals** β all output is a `Proposal` object requiring human approval
|
| 68 |
+
4. **Log every action** β audit trail in `agent_audit_log` table
|
| 69 |
+
5. **Run the ECC lifecycle** β preflight β plan β execute β postflight
|
| 70 |
+
6. **Respect iteration budgets** β max 1 retry for patches, max 3 for architecture changes
|
| 71 |
+
7. **Surface uncertainty** β if confidence < 0.5 on any decision, escalate to human
|
| 72 |
+
8. **Self-terminate** β if task exceeds time budget by 50%, auto-halt (Kill Heuristic)
|
| 73 |
+
|
| 74 |
+
### Companion Agent Types
|
| 75 |
+
|
| 76 |
+
| Agent Type | Purpose | Improves Research OS By |
|
| 77 |
+
|-----------|---------|------------------------|
|
| 78 |
+
| `DataQualityAuditor` | Audit claim extraction quality over time | Catching drift, hallucination creep |
|
| 79 |
+
| `PromptOptimizer` | A/B test system prompts against golden dataset | Improving extraction recall/precision |
|
| 80 |
+
| `DomainExpander` | Generate training examples for new STEM fields | Expanding model capability |
|
| 81 |
+
| `CalibrationAnalyst` | Analyze confidence calibration (Brier scores) | Reducing overconfidence |
|
| 82 |
+
| `CitationChaser` | Find papers citing/contradicting current claims | Enriching knowledge base |
|
| 83 |
+
| `SynthesisWriter` | Draft thesis sections from claim clusters | Phase 10 feature |
|
| 84 |
+
| `custom` | User-defined purpose and prompt | Any improvement task |
|
| 85 |
+
|
| 86 |
+
### Proposal Schema
|
| 87 |
+
|
| 88 |
+
```json
|
| 89 |
+
{
|
| 90 |
+
"proposal_id": "PROP_XXXXXXXX",
|
| 91 |
+
"agent_id": "COMP_XXXXXXXX",
|
| 92 |
+
"proposal_type": "prompt_change | training_data | confidence_adjustment | new_claim | architecture_change",
|
| 93 |
+
"description": "Human-readable description of what this proposes",
|
| 94 |
+
"changes": { ... },
|
| 95 |
+
"evidence": "Why this change should be made",
|
| 96 |
+
"estimated_impact": { "metric": "extraction_recall", "expected_delta": 0.05 },
|
| 97 |
+
"risk_assessment": "low | medium | high",
|
| 98 |
+
"reversible": true,
|
| 99 |
+
"status": "proposed | approved | rejected | applied",
|
| 100 |
+
"created_at": "ISO8601"
|
| 101 |
+
}
|
| 102 |
+
```
|