nkshirsa commited on
Commit
83050e2
·
verified ·
1 Parent(s): 27354a3

Add ECC Harness: phd_research_os/agent_os.py

Browse files
Files changed (1) hide show
  1. phd_research_os/agent_os.py +1051 -0
phd_research_os/agent_os.py ADDED
@@ -0,0 +1,1051 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ PhD Research OS — ECC Harness Orchestrator (agent_os.py)
3
+ =========================================================
4
+ The meta-system for spawning, managing, and auditing companion AI agents
5
+ that improve the core Research OS brain.
6
+
7
+ Implements the ECC Harness: Principal Architect Edition (V-SINGULARITY)
8
+ - §0: Global Objective Function (correctness > blast radius > simplicity > NFRs > no-op)
9
+ - §1: Pre-Flight (context loading, knowledge boundaries, assumption logging)
10
+ - §2: Planning (obviousness test, reversibility, idempotence, confidence signaling)
11
+ - §3: Execution (JIT verification, cognitive budget, failure modes, scope containment)
12
+ - §4: Post-Flight (validation, pedagogical handoff, definition of done, meta-learning)
13
+
14
+ WAKE-UP ROUTINE: Before any task, this module reads ARCHITECTURE.md and AGENTS.md
15
+ to ground itself in the project map. This is non-negotiable.
16
+ """
17
+
18
+ import json
19
+ import os
20
+ import sqlite3
21
+ import uuid
22
+ import time
23
+ from datetime import datetime, timezone
24
+ from typing import Optional, Callable
25
+ from dataclasses import dataclass, field, asdict
26
+ from pathlib import Path
27
+ from enum import Enum
28
+
29
+ from .db import get_db, init_db, now_iso, gen_id, to_fixed, from_fixed
30
+
31
+
32
+ # ============================================================
33
+ # ECC Lifecycle States
34
+ # ============================================================
35
+
36
+ class AgentState(Enum):
37
+ """ECC Harness lifecycle states for companion agents."""
38
+ SPAWNED = "spawned" # Created, not yet active
39
+ PREFLIGHT = "preflight" # §1: Loading context, validating assumptions
40
+ PLANNING = "planning" # §2: Building execution plan
41
+ EXECUTING = "executing" # §3: Running bounded task
42
+ POSTFLIGHT = "postflight" # §4: Validating results, logging decisions
43
+ COMPLETED = "completed" # Task done successfully
44
+ HALTED = "halted" # Kill heuristic triggered or error
45
+ RETIRED = "retired" # Agent decommissioned
46
+
47
+
48
+ class ProposalStatus(Enum):
49
+ PROPOSED = "proposed"
50
+ APPROVED = "approved"
51
+ REJECTED = "rejected"
52
+ APPLIED = "applied"
53
+
54
+
55
+ class RiskLevel(Enum):
56
+ LOW = "low"
57
+ MEDIUM = "medium"
58
+ HIGH = "high"
59
+
60
+
61
+ # ============================================================
62
+ # Data Structures
63
+ # ============================================================
64
+
65
+ @dataclass
66
+ class Proposal:
67
+ """
68
+ A companion agent's proposed change to the Research OS.
69
+ ALL companion output goes through proposals — never direct modification.
70
+ """
71
+ proposal_id: str
72
+ agent_id: str
73
+ proposal_type: str # prompt_change, training_data, confidence_adjustment, new_claim, architecture_change
74
+ description: str
75
+ changes: dict
76
+ evidence: str
77
+ estimated_impact: dict # {"metric": str, "expected_delta": float}
78
+ risk_assessment: str # low, medium, high
79
+ reversible: bool
80
+ status: str = "proposed"
81
+ created_at: str = ""
82
+ reviewed_at: str = ""
83
+ reviewed_by: str = ""
84
+ rejection_reason: str = ""
85
+
86
+ def to_dict(self):
87
+ return asdict(self)
88
+
89
+ def to_json(self):
90
+ return json.dumps(self.to_dict(), indent=2)
91
+
92
+
93
+ @dataclass
94
+ class AuditEntry:
95
+ """Immutable audit log entry. Every agent action is recorded."""
96
+ entry_id: str
97
+ agent_id: str
98
+ phase: str # preflight, planning, executing, postflight
99
+ action: str # what was done
100
+ details: str # specifics
101
+ confidence: float # agent's self-assessed confidence [0,1]
102
+ timestamp: str
103
+ deviation: str = "" # if deviating from rules, document why
104
+
105
+
106
+ @dataclass
107
+ class AgentTask:
108
+ """A bounded task assigned to a companion agent."""
109
+ task_id: str
110
+ agent_id: str
111
+ description: str
112
+ state: str = "preflight"
113
+ plan: str = "" # JSON execution plan
114
+ result: str = "" # JSON result
115
+ iterations_used: int = 0
116
+ max_iterations: int = 3 # §3 iteration budget
117
+ time_budget_s: int = 3600 # default 1 hour
118
+ started_at: str = ""
119
+ completed_at: str = ""
120
+ kill_reason: str = ""
121
+
122
+
123
+ # ============================================================
124
+ # Database Extension (adds companion agent tables)
125
+ # ============================================================
126
+
127
+ def init_agent_os_db(db_path: str = None):
128
+ """Extend the Research OS database with companion agent tables."""
129
+ # First ensure base tables exist
130
+ init_db(db_path)
131
+
132
+ conn = get_db(db_path)
133
+ conn.executescript("""
134
+ CREATE TABLE IF NOT EXISTS companion_agents (
135
+ agent_id TEXT PRIMARY KEY,
136
+ agent_type TEXT NOT NULL,
137
+ purpose TEXT NOT NULL,
138
+ system_prompt TEXT NOT NULL,
139
+ state TEXT NOT NULL DEFAULT 'spawned',
140
+ config TEXT, -- JSON: model, temperature, etc.
141
+ created_at TEXT NOT NULL,
142
+ retired_at TEXT,
143
+ total_tasks_completed INTEGER DEFAULT 0,
144
+ total_proposals_made INTEGER DEFAULT 0,
145
+ schema_version TEXT NOT NULL DEFAULT '1.0'
146
+ );
147
+
148
+ CREATE TABLE IF NOT EXISTS agent_tasks (
149
+ task_id TEXT PRIMARY KEY,
150
+ agent_id TEXT NOT NULL,
151
+ description TEXT NOT NULL,
152
+ state TEXT NOT NULL DEFAULT 'preflight',
153
+ plan TEXT, -- JSON execution plan
154
+ result TEXT, -- JSON result
155
+ iterations_used INTEGER DEFAULT 0,
156
+ max_iterations INTEGER DEFAULT 3,
157
+ time_budget_s INTEGER DEFAULT 3600,
158
+ started_at TEXT,
159
+ completed_at TEXT,
160
+ kill_reason TEXT,
161
+ schema_version TEXT NOT NULL DEFAULT '1.0',
162
+ FOREIGN KEY(agent_id) REFERENCES companion_agents(agent_id)
163
+ );
164
+
165
+ CREATE TABLE IF NOT EXISTS proposals (
166
+ proposal_id TEXT PRIMARY KEY,
167
+ agent_id TEXT NOT NULL,
168
+ task_id TEXT,
169
+ proposal_type TEXT NOT NULL,
170
+ description TEXT NOT NULL,
171
+ changes TEXT NOT NULL, -- JSON
172
+ evidence TEXT,
173
+ estimated_impact TEXT, -- JSON
174
+ risk_assessment TEXT DEFAULT 'low',
175
+ reversible INTEGER DEFAULT 1,
176
+ status TEXT DEFAULT 'proposed',
177
+ created_at TEXT NOT NULL,
178
+ reviewed_at TEXT,
179
+ reviewed_by TEXT,
180
+ rejection_reason TEXT,
181
+ schema_version TEXT NOT NULL DEFAULT '1.0',
182
+ FOREIGN KEY(agent_id) REFERENCES companion_agents(agent_id),
183
+ FOREIGN KEY(task_id) REFERENCES agent_tasks(task_id)
184
+ );
185
+
186
+ CREATE TABLE IF NOT EXISTS agent_audit_log (
187
+ entry_id TEXT PRIMARY KEY,
188
+ agent_id TEXT NOT NULL,
189
+ task_id TEXT,
190
+ phase TEXT NOT NULL,
191
+ action TEXT NOT NULL,
192
+ details TEXT,
193
+ confidence REAL,
194
+ deviation TEXT,
195
+ timestamp TEXT NOT NULL,
196
+ FOREIGN KEY(agent_id) REFERENCES companion_agents(agent_id)
197
+ );
198
+
199
+ CREATE TABLE IF NOT EXISTS harness_evolution (
200
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
201
+ rule_section TEXT NOT NULL,
202
+ amendment TEXT NOT NULL,
203
+ reason TEXT NOT NULL,
204
+ proposed_by TEXT NOT NULL,
205
+ timestamp TEXT NOT NULL,
206
+ approved INTEGER DEFAULT 0
207
+ );
208
+
209
+ CREATE TABLE IF NOT EXISTS memory_store (
210
+ key TEXT PRIMARY KEY,
211
+ value TEXT NOT NULL,
212
+ last_validated TEXT NOT NULL,
213
+ category TEXT DEFAULT 'assumption'
214
+ );
215
+ """)
216
+ conn.commit()
217
+ conn.close()
218
+
219
+
220
+ # ============================================================
221
+ # Companion Agent Definition
222
+ # ============================================================
223
+
224
+ # Pre-built companion types with their system prompts
225
+ COMPANION_TYPES = {
226
+ "DataQualityAuditor": {
227
+ "purpose": "Audit claim extraction quality, detect drift, flag hallucination patterns",
228
+ "system_prompt": """You are a Data Quality Auditor for a PhD Research OS. Your job is to:
229
+ 1. Compare extracted claims against source text to detect hallucinations
230
+ 2. Monitor extraction quality metrics over time for drift
231
+ 3. Flag claims with suspicious confidence scores (too high for weak evidence)
232
+ 4. Propose corrections as Proposal objects — NEVER modify data directly
233
+
234
+ Output JSON proposals: {"proposal_type": "confidence_adjustment", "changes": {...}, "evidence": "..."}
235
+ You operate at Provenance Level 5. All your findings require human verification."""
236
+ },
237
+ "PromptOptimizer": {
238
+ "purpose": "Improve system prompts via evaluation against golden dataset",
239
+ "system_prompt": """You are a Prompt Optimizer for a PhD Research OS. Your job is to:
240
+ 1. Analyze current extraction/classification prompts and their metrics
241
+ 2. Propose specific prompt modifications with expected impact
242
+ 3. Design A/B test criteria for prompt changes
243
+ 4. Ensure any change is regression-tested before deployment
244
+
245
+ Output JSON proposals: {"proposal_type": "prompt_change", "changes": {"prompt_name": "...", "old": "...", "new": "..."}, "evidence": "..."}
246
+ CRITICAL: Every prompt change MUST pass the regression gate (recall ≥70%, hallucination ≤10%, epistemic accuracy ≥60%)."""
247
+ },
248
+ "DomainExpander": {
249
+ "purpose": "Generate training examples for new STEM domains",
250
+ "system_prompt": """You are a Domain Expander for a PhD Research OS. Your job is to:
251
+ 1. Identify STEM domains not well-covered by current training data
252
+ 2. Generate high-quality synthetic training examples in TRL conversational format
253
+ 3. Include realistic claim extraction, epistemic tagging, and confidence scoring examples
254
+ 4. Ensure examples follow the exact JSON schema used by the core system
255
+
256
+ Output JSON proposals: {"proposal_type": "training_data", "changes": {"examples": [...]}, "evidence": "..."}
257
+ Quality requirement: All generated JSON must be valid. Include diverse epistemic tags and study types."""
258
+ },
259
+ "CalibrationAnalyst": {
260
+ "purpose": "Analyze confidence calibration and recommend scoring adjustments",
261
+ "system_prompt": """You are a Calibration Analyst for a PhD Research OS. Your job is to:
262
+ 1. Analyze the calibration_log table for systematic over/under-confidence
263
+ 2. Compute Brier scores when sufficient data exists (≥50 data points)
264
+ 3. Propose adjustments to study_quality_weight or journal_tier_weight values
265
+ 4. Flag specific claim categories where confidence is poorly calibrated
266
+
267
+ Output JSON proposals: {"proposal_type": "confidence_adjustment", "changes": {"parameter": "...", "old_value": N, "new_value": N}, "evidence": "Brier score analysis..."}
268
+ Use fixed-point math (×1000) for all proposed values."""
269
+ },
270
+ "CitationChaser": {
271
+ "purpose": "Find papers that cite or contradict current claims in the knowledge base",
272
+ "system_prompt": """You are a Citation Chaser for a PhD Research OS. Your job is to:
273
+ 1. Identify high-impact claims that may have newer supporting or contradicting evidence
274
+ 2. Propose new papers for ingestion based on citation chains
275
+ 3. Flag claims whose source papers have been retracted or corrected
276
+ 4. Suggest claims that need confidence updates based on new evidence
277
+
278
+ Output JSON proposals: {"proposal_type": "new_claim", "changes": {"suggested_papers": [...], "reason": "..."}, "evidence": "..."}
279
+ All suggestions are proposals. You cannot add papers to the database directly."""
280
+ }
281
+ }
282
+
283
+
284
+ # ============================================================
285
+ # The Agent OS — ECC Harness Orchestrator
286
+ # ============================================================
287
+
288
+ class AgentOS:
289
+ """
290
+ The meta-system for creating and managing companion AI agents.
291
+
292
+ Implements the full ECC Harness lifecycle:
293
+ spawn → preflight → plan → execute → postflight → retire
294
+
295
+ Every companion agent:
296
+ - Reads ARCHITECTURE.md and AGENTS.md before acting (Wake-Up Routine)
297
+ - Cannot directly modify the Research OS database
298
+ - Produces Proposals that require human approval
299
+ - Has bounded iteration budgets (Kill Heuristic)
300
+ - Logs every action to the audit trail
301
+
302
+ Usage:
303
+ os = AgentOS()
304
+ agent = os.spawn_companion("DataQualityAuditor")
305
+ task = os.assign_task(agent, "Audit last 50 claims for hallucination patterns")
306
+ os.run_task(task) # Executes full ECC lifecycle
307
+ proposals = os.get_proposals(agent) # Review what the agent found
308
+ os.approve_proposal(proposals[0]) # Human approves
309
+ """
310
+
311
+ def __init__(self, db_path: str = None, brain=None):
312
+ self.db_path = db_path or os.environ.get("RESEARCH_OS_DB", "data/research_os.db")
313
+ init_agent_os_db(self.db_path)
314
+ self.brain = brain # ResearchOSBrain instance for API calls
315
+ self._architecture = None
316
+ self._agents_doc = None
317
+
318
+ # ============================================================
319
+ # §0: Wake-Up Routine — ALWAYS read the map first
320
+ # ============================================================
321
+
322
+ def _wake_up(self) -> dict:
323
+ """
324
+ CRITICAL: Read ARCHITECTURE.md and AGENTS.md before any operation.
325
+ This is the ground truth for file locations and contracts.
326
+ """
327
+ context = {}
328
+
329
+ # Find and read architecture docs
330
+ for doc_name in ["ARCHITECTURE.md", "AGENTS.md"]:
331
+ for search_dir in [
332
+ Path(__file__).parent,
333
+ Path(__file__).parent.parent,
334
+ Path.cwd(),
335
+ ]:
336
+ doc_path = search_dir / doc_name
337
+ if doc_path.exists():
338
+ context[doc_name] = doc_path.read_text()
339
+ break
340
+ else:
341
+ context[doc_name] = f"[WARNING: {doc_name} not found — operating without map]"
342
+
343
+ self._architecture = context.get("ARCHITECTURE.md", "")
344
+ self._agents_doc = context.get("AGENTS.md", "")
345
+ return context
346
+
347
+ # ============================================================
348
+ # Spawn: Create a new companion agent
349
+ # ============================================================
350
+
351
+ def spawn_companion(self, agent_type: str, purpose: str = None,
352
+ system_prompt: str = None, config: dict = None) -> str:
353
+ """
354
+ Spawn a new companion AI agent.
355
+
356
+ Args:
357
+ agent_type: One of COMPANION_TYPES keys, or "custom"
358
+ purpose: Override default purpose (required for "custom")
359
+ system_prompt: Override default prompt (required for "custom")
360
+ config: Optional config dict (model, temperature, etc.)
361
+
362
+ Returns:
363
+ agent_id: Unique identifier for the companion agent
364
+ """
365
+ # Wake up first
366
+ self._wake_up()
367
+
368
+ # Resolve agent definition
369
+ if agent_type in COMPANION_TYPES:
370
+ defn = COMPANION_TYPES[agent_type]
371
+ purpose = purpose or defn["purpose"]
372
+ system_prompt = system_prompt or defn["system_prompt"]
373
+ elif agent_type == "custom":
374
+ if not purpose or not system_prompt:
375
+ raise ValueError("Custom agents require both 'purpose' and 'system_prompt'")
376
+ else:
377
+ raise ValueError(f"Unknown agent type: {agent_type}. "
378
+ f"Available: {list(COMPANION_TYPES.keys())} + 'custom'")
379
+
380
+ agent_id = gen_id("COMP")
381
+ conn = get_db(self.db_path)
382
+ conn.execute("""
383
+ INSERT INTO companion_agents (agent_id, agent_type, purpose, system_prompt,
384
+ state, config, created_at, schema_version)
385
+ VALUES (?, ?, ?, ?, 'spawned', ?, ?, '1.0')
386
+ """, (agent_id, agent_type, purpose, system_prompt,
387
+ json.dumps(config or {}), now_iso()))
388
+ conn.commit()
389
+ conn.close()
390
+
391
+ self._audit(agent_id, None, "spawn", "Agent created",
392
+ f"type={agent_type}, purpose={purpose[:100]}")
393
+
394
+ return agent_id
395
+
396
+ # ============================================================
397
+ # Task Assignment & Lifecycle
398
+ # ============================================================
399
+
400
+ def assign_task(self, agent_id: str, description: str,
401
+ max_iterations: int = 3, time_budget_s: int = 3600) -> str:
402
+ """
403
+ Assign a bounded task to a companion agent.
404
+
405
+ Args:
406
+ agent_id: The companion agent to assign to
407
+ description: What the agent should do
408
+ max_iterations: Max retry loops (§3 iteration budget)
409
+ time_budget_s: Max time in seconds (Kill Heuristic)
410
+
411
+ Returns:
412
+ task_id: Unique identifier for this task
413
+ """
414
+ task_id = gen_id("TASK")
415
+ conn = get_db(self.db_path)
416
+ conn.execute("""
417
+ INSERT INTO agent_tasks (task_id, agent_id, description, state,
418
+ max_iterations, time_budget_s, started_at, schema_version)
419
+ VALUES (?, ?, ?, 'preflight', ?, ?, ?, '1.0')
420
+ """, (task_id, agent_id, description, max_iterations, time_budget_s, now_iso()))
421
+ conn.commit()
422
+ conn.close()
423
+
424
+ self._audit(agent_id, task_id, "preflight", "Task assigned", description)
425
+ return task_id
426
+
427
+ def run_task(self, task_id: str) -> dict:
428
+ """
429
+ Execute the full ECC lifecycle for a task.
430
+
431
+ Lifecycle: preflight → planning → executing → postflight → completed/halted
432
+
433
+ Returns dict with task result and any proposals generated.
434
+ """
435
+ conn = get_db(self.db_path)
436
+ task_row = conn.execute("SELECT * FROM agent_tasks WHERE task_id = ?",
437
+ (task_id,)).fetchone()
438
+ if not task_row:
439
+ conn.close()
440
+ raise ValueError(f"Task {task_id} not found")
441
+ task = dict(task_row)
442
+
443
+ agent_row = conn.execute("SELECT * FROM companion_agents WHERE agent_id = ?",
444
+ (task["agent_id"],)).fetchone()
445
+ if not agent_row:
446
+ conn.close()
447
+ raise ValueError(f"Agent {task['agent_id']} not found")
448
+ agent = dict(agent_row)
449
+ conn.close()
450
+
451
+ start_time = time.time()
452
+ result = {"task_id": task_id, "proposals": [], "status": "unknown", "audit": []}
453
+
454
+ try:
455
+ # §1: PRE-FLIGHT
456
+ self._update_task_state(task_id, "preflight")
457
+ preflight_ok = self._preflight(task, agent)
458
+ if not preflight_ok:
459
+ self._halt_task(task_id, "Preflight checks failed")
460
+ result["status"] = "halted"
461
+ return result
462
+
463
+ # §2: PLANNING
464
+ self._update_task_state(task_id, "planning")
465
+ plan = self._plan(task, agent)
466
+
467
+ # The Obviousness Test (§2): Is there a simple direct solution?
468
+ if plan.get("obvious_solution"):
469
+ self._audit(task["agent_id"], task_id, "planning",
470
+ "Obviousness test passed", plan["obvious_solution"])
471
+
472
+ # §3: EXECUTION (with iteration budget)
473
+ self._update_task_state(task_id, "executing")
474
+ proposals = self._execute(task, agent, plan, start_time)
475
+ result["proposals"] = proposals
476
+
477
+ # §4: POST-FLIGHT
478
+ self._update_task_state(task_id, "postflight")
479
+ postflight_result = self._postflight(task, agent, proposals)
480
+ result["validation"] = postflight_result
481
+
482
+ # Mark completed
483
+ self._update_task_state(task_id, "completed")
484
+ result["status"] = "completed"
485
+
486
+ # Update agent stats
487
+ conn = get_db(self.db_path)
488
+ conn.execute("""
489
+ UPDATE companion_agents
490
+ SET total_tasks_completed = total_tasks_completed + 1,
491
+ total_proposals_made = total_proposals_made + ?
492
+ WHERE agent_id = ?
493
+ """, (len(proposals), task["agent_id"]))
494
+ conn.commit()
495
+ conn.close()
496
+
497
+ except Exception as e:
498
+ self._halt_task(task_id, f"Execution error: {str(e)}")
499
+ result["status"] = "halted"
500
+ result["error"] = str(e)
501
+
502
+ return result
503
+
504
+ # ============================================================
505
+ # §1: Pre-Flight Implementation
506
+ # ============================================================
507
+
508
+ def _preflight(self, task: dict, agent: dict) -> bool:
509
+ """
510
+ ECC §1: Context loading, reality validation, assumption logging.
511
+
512
+ Checks:
513
+ - Architecture docs loaded (Wake-Up Routine)
514
+ - Database is accessible
515
+ - Agent is not retired
516
+ - Task description is non-empty
517
+ """
518
+ # Wake-Up: Read architecture docs
519
+ context = self._wake_up()
520
+
521
+ checks = []
522
+
523
+ # Check architecture docs loaded
524
+ checks.append(("ARCHITECTURE.md loaded", "WARNING" not in context.get("ARCHITECTURE.md", "WARNING")))
525
+ checks.append(("AGENTS.md loaded", "WARNING" not in context.get("AGENTS.md", "WARNING")))
526
+
527
+ # Check DB accessible
528
+ try:
529
+ conn = get_db(self.db_path)
530
+ conn.execute("SELECT 1").fetchone()
531
+ conn.close()
532
+ checks.append(("Database accessible", True))
533
+ except Exception:
534
+ checks.append(("Database accessible", False))
535
+
536
+ # Check agent state
537
+ checks.append(("Agent not retired", agent["state"] != "retired"))
538
+
539
+ # Check task has content
540
+ checks.append(("Task description non-empty", bool(task.get("description", "").strip())))
541
+
542
+ # Log all checks
543
+ all_passed = all(passed for _, passed in checks)
544
+ details = json.dumps([{"check": name, "passed": passed} for name, passed in checks])
545
+ self._audit(task["agent_id"], task["task_id"], "preflight",
546
+ "Preflight checks" + (" PASSED" if all_passed else " FAILED"), details)
547
+
548
+ return all_passed
549
+
550
+ # ============================================================
551
+ # §2: Planning Implementation
552
+ # ============================================================
553
+
554
+ def _plan(self, task: dict, agent: dict) -> dict:
555
+ """
556
+ ECC §2: Build execution plan.
557
+
558
+ Includes:
559
+ - Obviousness test
560
+ - Reversibility classification
561
+ - Idempotence verification
562
+ - Confidence assessment
563
+ """
564
+ plan = {
565
+ "task_description": task["description"],
566
+ "agent_type": agent["agent_type"],
567
+ "steps": [],
568
+ "obvious_solution": None,
569
+ "reversible": True,
570
+ "confidence": 0.5,
571
+ }
572
+
573
+ # Obviousness Test: Can we solve this without a complex plan?
574
+ simple_tasks = ["audit", "check", "list", "count", "summarize"]
575
+ if any(word in task["description"].lower() for word in simple_tasks):
576
+ plan["obvious_solution"] = "Direct query against database — no complex planning needed"
577
+ plan["confidence"] = 0.8
578
+
579
+ # Build step list based on agent type
580
+ if agent["agent_type"] == "DataQualityAuditor":
581
+ plan["steps"] = [
582
+ "Query recent claims from database",
583
+ "Check each claim's evidence_strength vs epistemic_tag consistency",
584
+ "Flag claims where confidence > 0.8 but evidence is indirect",
585
+ "Generate proposals for flagged claims",
586
+ ]
587
+ elif agent["agent_type"] == "PromptOptimizer":
588
+ plan["steps"] = [
589
+ "Load current prompts from AGENTS.md",
590
+ "Run baseline evaluation against golden dataset",
591
+ "Identify weakest-performing task (lowest metric)",
592
+ "Generate 2-3 prompt variants",
593
+ "Propose A/B test with regression gate",
594
+ ]
595
+ elif agent["agent_type"] == "DomainExpander":
596
+ plan["steps"] = [
597
+ "Analyze current training data domain coverage",
598
+ "Identify underrepresented STEM fields",
599
+ "Generate 50-100 synthetic examples per field",
600
+ "Validate all examples produce valid JSON",
601
+ "Propose training data addition",
602
+ ]
603
+ elif agent["agent_type"] == "CalibrationAnalyst":
604
+ plan["steps"] = [
605
+ "Query calibration_log for all entries",
606
+ "Compute Brier scores per claim category",
607
+ "Identify systematic miscalibration patterns",
608
+ "Propose weight adjustments with evidence",
609
+ ]
610
+ elif agent["agent_type"] == "CitationChaser":
611
+ plan["steps"] = [
612
+ "Identify high-confidence canonical claims",
613
+ "Search for recent papers citing the same DOIs",
614
+ "Flag any new contradicting evidence",
615
+ "Propose new papers for ingestion",
616
+ ]
617
+ else:
618
+ # Custom agent — generic plan
619
+ plan["steps"] = [
620
+ "Analyze task requirements",
621
+ "Gather relevant data from database",
622
+ "Generate proposals based on findings",
623
+ "Self-validate proposals for consistency",
624
+ ]
625
+
626
+ # Save plan to DB
627
+ conn = get_db(self.db_path)
628
+ conn.execute("UPDATE agent_tasks SET plan = ? WHERE task_id = ?",
629
+ (json.dumps(plan), task["task_id"]))
630
+ conn.commit()
631
+ conn.close()
632
+
633
+ self._audit(task["agent_id"], task["task_id"], "planning",
634
+ "Plan created", f"{len(plan['steps'])} steps, confidence={plan['confidence']}")
635
+
636
+ return plan
637
+
638
+ # ============================================================
639
+ # §3: Execution Implementation
640
+ # ============================================================
641
+
642
+ def _execute(self, task: dict, agent: dict, plan: dict,
643
+ start_time: float) -> list:
644
+ """
645
+ ECC §3: Bounded execution with iteration budget and kill heuristic.
646
+
647
+ Returns list of Proposal objects.
648
+ """
649
+ proposals = []
650
+ iteration = 0
651
+ max_iter = task.get("max_iterations", 3)
652
+ time_budget = task.get("time_budget_s", 3600)
653
+
654
+ while iteration < max_iter:
655
+ iteration += 1
656
+
657
+ # Kill Heuristic: check time budget
658
+ elapsed = time.time() - start_time
659
+ if elapsed > time_budget * 1.5: # 50% over budget = HALT
660
+ self._audit(task["agent_id"], task["task_id"], "executing",
661
+ "KILL HEURISTIC TRIGGERED",
662
+ f"Elapsed {elapsed:.0f}s > budget {time_budget}s × 1.5")
663
+ break
664
+
665
+ # JIT State Verification (§3): Check DB hasn't been modified externally
666
+ # (In a full system, this would check file hashes / row versions)
667
+
668
+ # Execute based on agent type using the brain
669
+ if self.brain:
670
+ # Use the AI brain to execute the agent's task
671
+ messages = [
672
+ {"role": "system", "content": agent["system_prompt"]},
673
+ {"role": "user", "content": self._build_execution_prompt(task, plan, iteration)}
674
+ ]
675
+ try:
676
+ if self.brain.backend == "local":
677
+ raw = self.brain._generate_local(messages)
678
+ else:
679
+ raw = self.brain._generate_api(messages)
680
+
681
+ # Parse proposals from response
682
+ parsed = self._parse_proposals(raw, task["agent_id"], task["task_id"])
683
+ proposals.extend(parsed)
684
+
685
+ self._audit(task["agent_id"], task["task_id"], "executing",
686
+ f"Iteration {iteration}: generated {len(parsed)} proposals",
687
+ f"Total proposals: {len(proposals)}")
688
+
689
+ # If we got results, we can stop iterating
690
+ if parsed:
691
+ break
692
+
693
+ except Exception as e:
694
+ self._audit(task["agent_id"], task["task_id"], "executing",
695
+ f"Iteration {iteration}: error", str(e))
696
+ if iteration >= max_iter:
697
+ break
698
+ # Otherwise retry
699
+ else:
700
+ # No brain available — generate placeholder proposals from plan
701
+ self._audit(task["agent_id"], task["task_id"], "executing",
702
+ "No brain configured",
703
+ "Generating structural proposals without AI inference")
704
+
705
+ proposal = self._create_proposal(
706
+ task["agent_id"], task["task_id"],
707
+ proposal_type="architecture_change",
708
+ description=f"[Placeholder] Agent {agent['agent_type']}: {task['description']}",
709
+ changes={"note": "Brain not configured — this is a structural placeholder"},
710
+ evidence="Companion agent spawned but requires API key or local model",
711
+ estimated_impact={"metric": "system_coverage", "expected_delta": 0.0},
712
+ risk="low",
713
+ reversible=True,
714
+ )
715
+ proposals.append(proposal)
716
+ break
717
+
718
+ # Update iteration count
719
+ conn = get_db(self.db_path)
720
+ conn.execute("UPDATE agent_tasks SET iterations_used = ? WHERE task_id = ?",
721
+ (iteration, task["task_id"]))
722
+ conn.commit()
723
+ conn.close()
724
+
725
+ return proposals
726
+
727
+ def _build_execution_prompt(self, task: dict, plan: dict, iteration: int) -> str:
728
+ """Build the user prompt for the agent's execution phase."""
729
+ # Gather relevant DB context
730
+ conn = get_db(self.db_path)
731
+ claim_count = conn.execute("SELECT COUNT(*) FROM claims").fetchone()[0]
732
+ conflict_count = conn.execute("SELECT COUNT(*) FROM conflicts WHERE resolution_status = 'Unresolved'").fetchone()[0]
733
+
734
+ # Get sample claims for context
735
+ recent_claims = conn.execute(
736
+ "SELECT claim_id, text, epistemic_tag, confidence FROM claims ORDER BY created_at DESC LIMIT 10"
737
+ ).fetchall()
738
+ conn.close()
739
+
740
+ claims_context = "\n".join(
741
+ f" - [{dict(c)['claim_id']}] ({dict(c)['epistemic_tag']}, conf={from_fixed(dict(c)['confidence']):.3f}): {dict(c)['text'][:100]}..."
742
+ for c in recent_claims
743
+ )
744
+
745
+ return f"""TASK: {task['description']}
746
+
747
+ ITERATION: {iteration}
748
+ PLAN STEPS: {json.dumps(plan.get('steps', []))}
749
+
750
+ CURRENT DATABASE STATE:
751
+ - Total claims: {claim_count}
752
+ - Unresolved conflicts: {conflict_count}
753
+ - Recent claims:
754
+ {claims_context}
755
+
756
+ Based on your role and the above context, execute your task and output your findings
757
+ as JSON proposals. Each proposal must include: proposal_type, description, changes, evidence,
758
+ estimated_impact, risk_assessment, and reversible flag."""
759
+
760
+ # ============================================================
761
+ # §4: Post-Flight Implementation
762
+ # ============================================================
763
+
764
+ def _postflight(self, task: dict, agent: dict, proposals: list) -> dict:
765
+ """
766
+ ECC §4: Validate results, check definition of done, log meta-learning.
767
+ """
768
+ validation = {
769
+ "proposals_count": len(proposals),
770
+ "all_valid_json": True,
771
+ "invariants_preserved": True,
772
+ "expert_intuition_check": "pending_human_review",
773
+ "definition_of_done": {
774
+ "aligns_with_intent": True,
775
+ "invariants_hold": True,
776
+ "no_nfr_regression": True,
777
+ }
778
+ }
779
+
780
+ # Validate each proposal
781
+ for p in proposals:
782
+ if isinstance(p, dict):
783
+ # Check required fields
784
+ required = ["proposal_type", "description", "changes"]
785
+ if not all(k in p for k in required):
786
+ validation["all_valid_json"] = False
787
+
788
+ # Check no proposal directly modifies claims (invariant)
789
+ changes = p.get("changes", {})
790
+ if "direct_db_write" in str(changes).lower():
791
+ validation["invariants_preserved"] = False
792
+
793
+ self._audit(task["agent_id"], task["task_id"], "postflight",
794
+ "Validation complete", json.dumps(validation))
795
+
796
+ return validation
797
+
798
+ # ============================================================
799
+ # Proposal Management
800
+ # ============================================================
801
+
802
+ def _create_proposal(self, agent_id: str, task_id: str,
803
+ proposal_type: str, description: str,
804
+ changes: dict, evidence: str,
805
+ estimated_impact: dict, risk: str,
806
+ reversible: bool) -> dict:
807
+ """Create and store a proposal."""
808
+ proposal_id = gen_id("PROP")
809
+ conn = get_db(self.db_path)
810
+ conn.execute("""
811
+ INSERT INTO proposals (proposal_id, agent_id, task_id, proposal_type,
812
+ description, changes, evidence, estimated_impact, risk_assessment,
813
+ reversible, status, created_at, schema_version)
814
+ VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'proposed', ?, '1.0')
815
+ """, (proposal_id, agent_id, task_id, proposal_type, description,
816
+ json.dumps(changes), evidence, json.dumps(estimated_impact),
817
+ risk, int(reversible), now_iso()))
818
+ conn.commit()
819
+ conn.close()
820
+
821
+ return {
822
+ "proposal_id": proposal_id,
823
+ "agent_id": agent_id,
824
+ "proposal_type": proposal_type,
825
+ "description": description,
826
+ "changes": changes,
827
+ "evidence": evidence,
828
+ "estimated_impact": estimated_impact,
829
+ "risk_assessment": risk,
830
+ "reversible": reversible,
831
+ "status": "proposed",
832
+ }
833
+
834
+ def _parse_proposals(self, raw_output: str, agent_id: str, task_id: str) -> list:
835
+ """Parse proposals from agent's raw output."""
836
+ proposals = []
837
+
838
+ # Try to extract JSON
839
+ text = raw_output.strip()
840
+ if text.startswith("```"):
841
+ text = text.split("```")[1]
842
+ if text.startswith("json"):
843
+ text = text[4:]
844
+ text = text.strip()
845
+
846
+ try:
847
+ data = json.loads(text)
848
+ # Handle single proposal or list
849
+ items = data if isinstance(data, list) else [data]
850
+
851
+ for item in items:
852
+ if isinstance(item, dict) and "proposal_type" in item:
853
+ p = self._create_proposal(
854
+ agent_id, task_id,
855
+ item.get("proposal_type", "unknown"),
856
+ item.get("description", ""),
857
+ item.get("changes", {}),
858
+ item.get("evidence", ""),
859
+ item.get("estimated_impact", {}),
860
+ item.get("risk_assessment", "low"),
861
+ item.get("reversible", True),
862
+ )
863
+ proposals.append(p)
864
+ except json.JSONDecodeError:
865
+ # If JSON parsing fails, create a raw-text proposal
866
+ proposals.append(self._create_proposal(
867
+ agent_id, task_id,
868
+ "raw_finding",
869
+ raw_output[:500],
870
+ {"raw_output": raw_output},
871
+ "Agent output was not parseable JSON",
872
+ {"metric": "unknown", "expected_delta": 0},
873
+ "low", True,
874
+ ))
875
+
876
+ return proposals
877
+
878
+ def get_proposals(self, agent_id: str = None, status: str = None) -> list:
879
+ """Get proposals, optionally filtered by agent and/or status."""
880
+ conn = get_db(self.db_path)
881
+ conditions = []
882
+ params = []
883
+
884
+ if agent_id:
885
+ conditions.append("agent_id = ?")
886
+ params.append(agent_id)
887
+ if status:
888
+ conditions.append("status = ?")
889
+ params.append(status)
890
+
891
+ where = " AND ".join(conditions) if conditions else "1=1"
892
+ rows = conn.execute(
893
+ f"SELECT * FROM proposals WHERE {where} ORDER BY created_at DESC",
894
+ params
895
+ ).fetchall()
896
+ conn.close()
897
+
898
+ results = []
899
+ for row in rows:
900
+ d = dict(row)
901
+ d["changes"] = json.loads(d.get("changes", "{}"))
902
+ d["estimated_impact"] = json.loads(d.get("estimated_impact", "{}"))
903
+ results.append(d)
904
+ return results
905
+
906
+ def approve_proposal(self, proposal_id: str, reviewed_by: str = "human") -> bool:
907
+ """Human approves a proposal."""
908
+ conn = get_db(self.db_path)
909
+ conn.execute("""
910
+ UPDATE proposals SET status = 'approved', reviewed_at = ?, reviewed_by = ?
911
+ WHERE proposal_id = ?
912
+ """, (now_iso(), reviewed_by, proposal_id))
913
+ conn.commit()
914
+ conn.close()
915
+ return True
916
+
917
+ def reject_proposal(self, proposal_id: str, reason: str,
918
+ reviewed_by: str = "human") -> bool:
919
+ """Human rejects a proposal with documented reason."""
920
+ conn = get_db(self.db_path)
921
+ conn.execute("""
922
+ UPDATE proposals SET status = 'rejected', reviewed_at = ?,
923
+ reviewed_by = ?, rejection_reason = ?
924
+ WHERE proposal_id = ?
925
+ """, (now_iso(), reviewed_by, reason, proposal_id))
926
+ conn.commit()
927
+ conn.close()
928
+ return True
929
+
930
+ # ============================================================
931
+ # Agent Management
932
+ # ============================================================
933
+
934
+ def list_companions(self, include_retired: bool = False) -> list:
935
+ """List all companion agents."""
936
+ conn = get_db(self.db_path)
937
+ if include_retired:
938
+ rows = conn.execute("SELECT * FROM companion_agents ORDER BY created_at DESC").fetchall()
939
+ else:
940
+ rows = conn.execute(
941
+ "SELECT * FROM companion_agents WHERE state != 'retired' ORDER BY created_at DESC"
942
+ ).fetchall()
943
+ conn.close()
944
+ return [dict(r) for r in rows]
945
+
946
+ def retire_companion(self, agent_id: str) -> bool:
947
+ """Retire a companion agent. Immutable — cannot be unretired."""
948
+ conn = get_db(self.db_path)
949
+ conn.execute("""
950
+ UPDATE companion_agents SET state = 'retired', retired_at = ?
951
+ WHERE agent_id = ?
952
+ """, (now_iso(), agent_id))
953
+ conn.commit()
954
+ conn.close()
955
+ self._audit(agent_id, None, "postflight", "Agent retired", "")
956
+ return True
957
+
958
+ def get_audit_log(self, agent_id: str = None, limit: int = 50) -> list:
959
+ """Get audit log entries."""
960
+ conn = get_db(self.db_path)
961
+ if agent_id:
962
+ rows = conn.execute(
963
+ "SELECT * FROM agent_audit_log WHERE agent_id = ? ORDER BY timestamp DESC LIMIT ?",
964
+ (agent_id, limit)
965
+ ).fetchall()
966
+ else:
967
+ rows = conn.execute(
968
+ "SELECT * FROM agent_audit_log ORDER BY timestamp DESC LIMIT ?",
969
+ (limit,)
970
+ ).fetchall()
971
+ conn.close()
972
+ return [dict(r) for r in rows]
973
+
974
+ # ============================================================
975
+ # Memory & Harness Evolution
976
+ # ============================================================
977
+
978
+ def set_memory(self, key: str, value: str, category: str = "assumption"):
979
+ """Store a persistent memory/assumption with validation timestamp."""
980
+ conn = get_db(self.db_path)
981
+ conn.execute("""
982
+ INSERT OR REPLACE INTO memory_store (key, value, last_validated, category)
983
+ VALUES (?, ?, ?, ?)
984
+ """, (key, value, now_iso(), category))
985
+ conn.commit()
986
+ conn.close()
987
+
988
+ def get_memory(self, key: str) -> Optional[dict]:
989
+ """Retrieve a stored memory/assumption."""
990
+ conn = get_db(self.db_path)
991
+ row = conn.execute("SELECT * FROM memory_store WHERE key = ?", (key,)).fetchone()
992
+ conn.close()
993
+ return dict(row) if row else None
994
+
995
+ def propose_harness_evolution(self, rule_section: str, amendment: str,
996
+ reason: str, proposed_by: str) -> int:
997
+ """
998
+ §4 Meta-Learning: Propose an amendment to the ECC Harness rules.
999
+ Requires human approval before taking effect.
1000
+ """
1001
+ conn = get_db(self.db_path)
1002
+ cursor = conn.execute("""
1003
+ INSERT INTO harness_evolution (rule_section, amendment, reason,
1004
+ proposed_by, timestamp, approved)
1005
+ VALUES (?, ?, ?, ?, ?, 0)
1006
+ """, (rule_section, amendment, reason, proposed_by, now_iso()))
1007
+ conn.commit()
1008
+ evo_id = cursor.lastrowid
1009
+ conn.close()
1010
+ return evo_id
1011
+
1012
+ # ============================================================
1013
+ # Internal Utilities
1014
+ # ============================================================
1015
+
1016
+ def _update_task_state(self, task_id: str, state: str):
1017
+ """Update task lifecycle state."""
1018
+ conn = get_db(self.db_path)
1019
+ updates = {"state": state}
1020
+ if state == "completed":
1021
+ conn.execute("UPDATE agent_tasks SET state = ?, completed_at = ? WHERE task_id = ?",
1022
+ (state, now_iso(), task_id))
1023
+ else:
1024
+ conn.execute("UPDATE agent_tasks SET state = ? WHERE task_id = ?",
1025
+ (state, task_id))
1026
+ conn.commit()
1027
+ conn.close()
1028
+
1029
+ def _halt_task(self, task_id: str, reason: str):
1030
+ """Halt a task (kill heuristic or error)."""
1031
+ conn = get_db(self.db_path)
1032
+ conn.execute("""
1033
+ UPDATE agent_tasks SET state = 'halted', kill_reason = ?, completed_at = ?
1034
+ WHERE task_id = ?
1035
+ """, (reason, now_iso(), task_id))
1036
+ conn.commit()
1037
+ conn.close()
1038
+
1039
+ def _audit(self, agent_id: str, task_id: Optional[str], phase: str,
1040
+ action: str, details: str, confidence: float = 0.5,
1041
+ deviation: str = ""):
1042
+ """Write an immutable audit log entry."""
1043
+ conn = get_db(self.db_path)
1044
+ conn.execute("""
1045
+ INSERT INTO agent_audit_log (entry_id, agent_id, task_id, phase,
1046
+ action, details, confidence, deviation, timestamp)
1047
+ VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
1048
+ """, (gen_id("AUDIT"), agent_id, task_id, phase, action, details,
1049
+ confidence, deviation, now_iso()))
1050
+ conn.commit()
1051
+ conn.close()