nkshirsa commited on
Commit
4f945e6
Β·
verified Β·
1 Parent(s): dd0b0be

Add ECC Harness: phd_research_os/AGENTS.md

Browse files
Files changed (1) hide show
  1. phd_research_os/AGENTS.md +102 -0
phd_research_os/AGENTS.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PhD Research OS β€” Agent Registry & Contracts
2
+
3
+ > **WAKE-UP INSTRUCTION**: This file defines every agent role, its contract,
4
+ > its boundaries, and how companion agents relate to the core brain.
5
+
6
+ ## Agent Hierarchy
7
+
8
+ ```
9
+ Human Researcher (Provenance Level 1)
10
+ β”‚
11
+ β”œβ”€β”€ Research OS Brain (Level 5 β€” core agents.py)
12
+ β”‚ β”œβ”€β”€ Researcher Agent β†’ claim extraction
13
+ β”‚ β”œβ”€β”€ Epistemic Classifier β†’ Fact/Interpretation/Hypothesis/Conflict
14
+ β”‚ β”œβ”€β”€ Confidence Scorer β†’ formula-based scoring
15
+ β”‚ β”œβ”€β”€ Verifier Agent β†’ contradiction detection
16
+ β”‚ β”œβ”€β”€ Query Planner β†’ question decomposition
17
+ β”‚ └── Decision Generator β†’ research action proposals
18
+ β”‚
19
+ └── Companion Agents (Level 5 β€” spawned via agent_os.py)
20
+ β”œβ”€β”€ DataQualityAuditor β†’ audit extraction quality, flag drift
21
+ β”œβ”€β”€ PromptOptimizer β†’ improve system prompts via A/B testing
22
+ β”œβ”€β”€ DomainExpander β†’ generate training data for new STEM fields
23
+ β”œβ”€β”€ CalibrationAnalyst β†’ analyze Brier scores, recommend adjustments
24
+ β”œβ”€β”€ CitationChaser β†’ find papers that cite/contradict current claims
25
+ └── [Custom] β†’ user-defined agents via factory
26
+ ```
27
+
28
+ ## Core Agent Contracts (agents.py)
29
+
30
+ ### Researcher Agent
31
+ - **Input**: Raw scientific text (1 page)
32
+ - **Output**: `{"claims": [ClaimObject, ...]}`
33
+ - **Constraint**: Epistemic tags conservative. Prefer "Interpretation" when uncertain.
34
+ - **Provenance**: Level 5. Claims must be human-reviewed before canonical status.
35
+
36
+ ### Epistemic Classifier
37
+ - **Input**: Single scientific statement
38
+ - **Output**: `{"epistemic_tag": str, "reasoning": str, "confidence_in_classification": float}`
39
+ - **Constraint**: 4-class only. No intermediate tags.
40
+
41
+ ### Confidence Scorer
42
+ - **Input**: Claim text + journal + study type + tier
43
+ - **Output**: `{"confidence": float, ...factor_breakdown...}`
44
+ - **Constraint**: MUST use fixed-point formula. No free-form scoring.
45
+
46
+ ### Verifier Agent
47
+ - **Input**: Claim pair (A, B)
48
+ - **Output**: `{"conflict_detected": bool, "conflict_type": str, "hypothesis_confidence": "low", ...}`
49
+ - **INVARIANT**: `hypothesis_confidence` is ALWAYS `"low"`. Hardcoded. Never changes.
50
+
51
+ ### Query Planner
52
+ - **Input**: Broad research question
53
+ - **Output**: `{"sub_queries": [str, ...], "reasoning": str}`
54
+ - **Constraint**: 2–4 sub-queries. Each independently searchable.
55
+
56
+ ### Decision Generator
57
+ - **Input**: Goal + gaps + low-confidence claims
58
+ - **Output**: DecisionObject with information gain
59
+ - **Constraint**: `expected_information_gain = uncertainty Γ— impact`
60
+
61
+ ## Companion Agent Contract (agent_os.py)
62
+
63
+ Every companion agent MUST:
64
+
65
+ 1. **Declare its purpose** at spawn time (immutable after creation)
66
+ 2. **Operate within boundaries** β€” cannot directly modify claims, sources, or goals
67
+ 3. **Produce proposals** β€” all output is a `Proposal` object requiring human approval
68
+ 4. **Log every action** β€” audit trail in `agent_audit_log` table
69
+ 5. **Run the ECC lifecycle** β€” preflight β†’ plan β†’ execute β†’ postflight
70
+ 6. **Respect iteration budgets** β€” max 1 retry for patches, max 3 for architecture changes
71
+ 7. **Surface uncertainty** β€” if confidence < 0.5 on any decision, escalate to human
72
+ 8. **Self-terminate** β€” if task exceeds time budget by 50%, auto-halt (Kill Heuristic)
73
+
74
+ ### Companion Agent Types
75
+
76
+ | Agent Type | Purpose | Improves Research OS By |
77
+ |-----------|---------|------------------------|
78
+ | `DataQualityAuditor` | Audit claim extraction quality over time | Catching drift, hallucination creep |
79
+ | `PromptOptimizer` | A/B test system prompts against golden dataset | Improving extraction recall/precision |
80
+ | `DomainExpander` | Generate training examples for new STEM fields | Expanding model capability |
81
+ | `CalibrationAnalyst` | Analyze confidence calibration (Brier scores) | Reducing overconfidence |
82
+ | `CitationChaser` | Find papers citing/contradicting current claims | Enriching knowledge base |
83
+ | `SynthesisWriter` | Draft thesis sections from claim clusters | Phase 10 feature |
84
+ | `custom` | User-defined purpose and prompt | Any improvement task |
85
+
86
+ ### Proposal Schema
87
+
88
+ ```json
89
+ {
90
+ "proposal_id": "PROP_XXXXXXXX",
91
+ "agent_id": "COMP_XXXXXXXX",
92
+ "proposal_type": "prompt_change | training_data | confidence_adjustment | new_claim | architecture_change",
93
+ "description": "Human-readable description of what this proposes",
94
+ "changes": { ... },
95
+ "evidence": "Why this change should be made",
96
+ "estimated_impact": { "metric": "extraction_recall", "expected_delta": 0.05 },
97
+ "risk_assessment": "low | medium | high",
98
+ "reversible": true,
99
+ "status": "proposed | approved | rejected | applied",
100
+ "created_at": "ISO8601"
101
+ }
102
+ ```