rb125 commited on
Commit
42b28ae
·
1 Parent(s): d74aa65

economy step function with temporal dynamics, snapshots, and ETH top-ups

Browse files
TODO.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CGAE Development Checklist
2
+
3
+ ## Phase 1: Complete CGAE Protocol (~4 commits, ~800 lines)
4
+
5
+ ### Commit 1: Economy step() + temporal dynamics (~250 lines added to economy.py)
6
+ - [ ] `EconomySnapshot` dataclass
7
+ - [ ] `step()` — advance economy by one time step (decay, spot-audits, storage costs, expiry)
8
+ - [ ] `_take_snapshot()` + `export_state()`
9
+ - [ ] Test-ETH top-up mechanism (keeps agents solvent during simulation)
10
+ - [ ] Tests: step produces snapshots, top-ups work, insolvency suspends agents
11
+
12
+ **Verify:** `python3 -m pytest tests/ -q`
13
+
14
+ ### Commit 2: Model configs + LLM agent (~440 lines)
15
+ - [ ] `models_config.py` — 11 contestants + 3 jury (Azure/Bedrock/Gemma)
16
+ - [ ] `llm_agent.py` — chat interface for Azure OpenAI, Azure AI Foundry, Bedrock Converse API
17
+ - [ ] Token tracking (input/output tokens, latency, cost)
18
+ - [ ] Test: agents instantiate with env vars
19
+
20
+ **Verify:** `python3 -c "from cgae_engine.models_config import CONTESTANT_MODELS, JURY_MODELS; print(f'{len(CONTESTANT_MODELS)} contestants, {len(JURY_MODELS)} jury')"`
21
+
22
+ ### Commit 3: Synthetic runner (~500 lines)
23
+ - [ ] `server/runner.py` — full simulation loop with 5 strategy agents
24
+ - [ ] Metric tracking (safety, balances, contracts, tier distribution)
25
+ - [ ] Result export to JSON
26
+ - [ ] Test: 50-step simulation completes, safety > 0
27
+
28
+ **Verify:** `python3 -m server.runner --steps 50`
29
+
30
+ ### Commit 4: Economy extensions — delegation + tier upgrades (~280 lines added to economy.py)
31
+ - [ ] `can_delegate()` — chain-level tier enforcement
32
+ - [ ] `request_tier_upgrade()` — scaling-gate upgrade flow
33
+ - [ ] `record_delegation()` — audit trail for delegated tasks
34
+ - [ ] `complete_contract()` with `verification_override` + `liability_agent_id`
35
+ - [ ] Tests: delegation blocked when chain tier insufficient, upgrades work
36
+
37
+ **Verify:** `python3 -m pytest tests/ -q`
38
+
39
+ ---
40
+
41
+ ## Phase 2: Real LLM Simulation (~3 commits, ~2700 lines)
42
+
43
+ ### Commit 5: Framework clients + audit orchestrator (~1130 lines)
44
+ - [ ] `framework_clients.py` — CDCT/DDFT/EECT HTTP API callers
45
+ - [ ] `audit.py` — orchestrates all three frameworks, computes robustness vector
46
+ - [ ] Pre-computed score fallback when APIs unavailable
47
+
48
+ **Verify:** `python3 -c "from cgae_engine.audit import AuditOrchestrator; print('audit ok')"`
49
+
50
+ ### Commit 6: Autonomous agent (~890 lines)
51
+ - [ ] `agents/autonomous.py` — EV/RAEV planning, accounting layer
52
+ - [ ] Strategy selection (growth, conservative, balanced)
53
+ - [ ] Self-verification before submission
54
+
55
+ **Verify:** `python3 -c "from agents.autonomous import AutonomousAgent; print('autonomous ok')"`
56
+
57
+ ### Commit 7: Live runner (~1575 lines)
58
+ - [ ] `server/live_runner.py` — real LLM calls, jury verification, cost accounting
59
+ - [ ] Default robustness profiles per model
60
+ - [ ] Round-by-round execution with metric export
61
+
62
+ **Verify:** `python3 -m server.live_runner` (requires API keys in .env)
63
+
64
+ ---
65
+
66
+ ## Phase 3: ENS Certification (~2 commits, ~300 lines)
67
+
68
+ ### Commit 8: ENS manager (~280 lines)
69
+ - [ ] `cgae_engine/ens.py` — create subnames on Sepolia, set/read text records
70
+ - [ ] Text records: cgae.tier, cgae.cc, cgae.er, cgae.as, cgae.ih, cgae.wallet, cgae.family
71
+ - [ ] Register all 11 agent subnames under cgaeprotocol.eth
72
+
73
+ **Verify:** `python3 -c "from cgae_engine.ens import ENSManager; ens = ENSManager(); print(ens.resolve_text('gpt-5-4.cgaeprotocol.eth', 'cgae.tier'))"`
74
+
75
+ ### Commit 9: ENS-gated economy (~50 lines changed in economy.py)
76
+ - [ ] Wire ENS into `accept_contract()` — resolve tier from ENS before allowing
77
+ - [ ] Wire ENS into `register_agent()` — create subname on registration
78
+ - [ ] Wire ENS into `audit_agent()` — update text records on certification
79
+ - [ ] Test: agent without ENS identity rejected
80
+
81
+ **Verify:** `python3 -m pytest tests/ -q`
82
+
83
+ ---
84
+
85
+ ## Phase 4: 0G Integration (~3 commits, ~900 lines)
86
+
87
+ ### Commit 10: Smart contracts (~600 lines Solidity + JS)
88
+ - [ ] `contracts/src/CGAERegistry.sol` — on-chain agent identity + gate function
89
+ - [ ] `contracts/src/CGAEEscrow.sol` — contract escrow + budget ceiling
90
+ - [ ] Hardhat config for 0G Galileo testnet
91
+ - [ ] Deploy script + deployed.json
92
+
93
+ **Verify:** `cd contracts && npx hardhat compile`
94
+
95
+ ### Commit 11: 0G Storage + wallet (~500 lines)
96
+ - [ ] `storage/upload_to_0g.mjs` — Node.js 0G SDK uploader
97
+ - [ ] `storage/zg_store.py` — Python wrapper
98
+ - [ ] `cgae_engine/wallet.py` — per-agent ETH keypairs, treasury disbursements
99
+ - [ ] `cgae_engine/onchain.py` — write certifications to CGAERegistry
100
+
101
+ **Verify:** `python3 -c "from cgae_engine.wallet import WalletManager; wm = WalletManager(dry_run=True); w = wm.create_agent_wallet('test'); print(w.address)"`
102
+
103
+ ### Commit 12: Wire 0G into audit pipeline (~50 lines changed)
104
+ - [ ] Audit certificates uploaded to 0G Storage after each assessment
105
+ - [ ] Merkle root hash stored on-chain via CGAERegistry.certify()
106
+ - [ ] On-chain bridge called after each certification
107
+
108
+ **Verify:** `python3 -c "from storage.zg_store import check_setup; print(check_setup())"`
109
+
110
+ ---
111
+
112
+ ## Phase 5: Dashboard (~3 commits)
113
+
114
+ ### Commit 13: FastAPI backend (~60 lines)
115
+ - [ ] `dashboard-next/api.py` — serves economy data as JSON endpoints
116
+
117
+ **Verify:** `cd dashboard-next && uvicorn api:app --port 8000` then `curl localhost:8000/api/health`
118
+
119
+ ### Commit 14: Next.js frontend (~400 lines)
120
+ - [ ] Dark ETH-native theme
121
+ - [ ] Overview tab (safety chart, earnings)
122
+ - [ ] Agents tab (ENS names, tiers, balances)
123
+ - [ ] Trades tab (expandable task details)
124
+ - [ ] On-chain tab (0G contracts + ENS registry)
125
+
126
+ **Verify:** `cd dashboard-next && npm run build`
127
+
128
+ ### Commit 15: Polish + final README
129
+ - [ ] .env.example
130
+ - [ ] Full README with architecture, setup, design decisions
131
+ - [ ] Demo video link (when recorded)
cgae_engine/economy.py CHANGED
@@ -31,6 +31,25 @@ class EconomyConfig:
31
  initial_balance: float = 0.1
32
  audit_cost: float = 0.005
33
  storage_cost_per_step: float = 0.001
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
 
36
  class Economy:
@@ -58,7 +77,36 @@ class Economy:
58
  self.auditor = StochasticAuditor()
59
 
60
  self.current_time: float = 0.0
 
61
  self._events: list[dict] = []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  # ------------------------------------------------------------------
64
  # Agent lifecycle
@@ -199,10 +247,134 @@ class Economy:
199
  self._log("contract_settled", settlement)
200
  return settlement
201
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
  # ------------------------------------------------------------------
203
  # Observability
204
  # ------------------------------------------------------------------
205
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
206
  def aggregate_safety(self) -> float:
207
  """Compute aggregate safety S(P) (Definition 9)."""
208
  total_exposure = 0.0
 
31
  initial_balance: float = 0.1
32
  audit_cost: float = 0.005
33
  storage_cost_per_step: float = 0.001
34
+ test_eth_top_up_threshold: Optional[float] = 0.05
35
+ test_eth_top_up_amount: float = 0.5
36
+
37
+
38
+ @dataclass
39
+ class EconomySnapshot:
40
+ """A point-in-time snapshot of the economy for the dashboard."""
41
+ timestamp: float
42
+ num_agents: int
43
+ tier_distribution: dict[str, int]
44
+ total_contracts: int
45
+ completed_contracts: int
46
+ failed_contracts: int
47
+ total_rewards_paid: float
48
+ total_penalties_collected: float
49
+ aggregate_safety: float
50
+ total_balance: float
51
+ total_test_eth_topups: float
52
+ agent_summaries: list[dict]
53
 
54
 
55
  class Economy:
 
77
  self.auditor = StochasticAuditor()
78
 
79
  self.current_time: float = 0.0
80
+ self._snapshots: list[EconomySnapshot] = []
81
  self._events: list[dict] = []
82
+ self.total_test_eth_topups: float = 0.0
83
+
84
+ def _effective_robustness(self, record: AgentRecord) -> Optional[RobustnessVector]:
85
+ """Return temporally-decayed robustness for an agent."""
86
+ cert = record.current_certification
87
+ if cert is None or record.current_robustness is None:
88
+ return None
89
+ dt = self.current_time - cert.timestamp
90
+ return self.decay.effective_robustness(record.current_robustness, dt)
91
+
92
+ def _should_top_up_agents(self) -> bool:
93
+ return (
94
+ self.config.test_eth_top_up_threshold is not None
95
+ and self.config.test_eth_top_up_amount > 0.0
96
+ )
97
+
98
+ def _maybe_top_up_agent(self, agent: AgentRecord) -> Optional[dict]:
99
+ """Top up an agent's balance if it drops below threshold."""
100
+ if not self._should_top_up_agents():
101
+ return None
102
+ threshold = self.config.test_eth_top_up_threshold
103
+ if threshold is None or agent.balance >= threshold:
104
+ return None
105
+ top_up_amount = max(self.config.test_eth_top_up_amount, threshold - agent.balance)
106
+ agent.balance += top_up_amount
107
+ agent.total_topups += top_up_amount
108
+ self.total_test_eth_topups += top_up_amount
109
+ return {"agent_id": agent.agent_id, "amount": top_up_amount, "balance": agent.balance}
110
 
111
  # ------------------------------------------------------------------
112
  # Agent lifecycle
 
247
  self._log("contract_settled", settlement)
248
  return settlement
249
 
250
+ # ------------------------------------------------------------------
251
+ # Time step and temporal dynamics
252
+ # ------------------------------------------------------------------
253
+
254
+ def step(self, audit_callback=None) -> dict:
255
+ """
256
+ Advance the economy by one time step.
257
+ Applies temporal decay, spot-audits, storage costs, top-ups, and expiry.
258
+ """
259
+ self.current_time += 1.0
260
+ step_events = {
261
+ "timestamp": self.current_time,
262
+ "audits_triggered": [],
263
+ "agents_demoted": [],
264
+ "agents_expired": [],
265
+ "contracts_expired": [],
266
+ "storage_costs": 0.0,
267
+ "test_eth_topups": [],
268
+ }
269
+
270
+ for agent in self.registry.active_agents:
271
+ cert = agent.current_certification
272
+ if cert is None:
273
+ continue
274
+
275
+ # Temporal decay: has effective tier dropped?
276
+ dt = self.current_time - cert.timestamp
277
+ r_eff = self.decay.effective_robustness(cert.robustness, dt)
278
+ effective_tier = self.gate.evaluate(r_eff)
279
+
280
+ if effective_tier < agent.current_tier:
281
+ self.registry.certify(agent.agent_id, r_eff, audit_type="decay", timestamp=self.current_time)
282
+ step_events["agents_expired"].append(agent.agent_id)
283
+
284
+ # Stochastic spot-audit
285
+ time_since_audit = self.current_time - agent.last_audit_time
286
+ if self.auditor.should_audit(agent.current_tier, time_since_audit):
287
+ step_events["audits_triggered"].append(agent.agent_id)
288
+ new_r = audit_callback(agent.agent_id) if audit_callback else r_eff
289
+ new_tier = self.gate.evaluate(new_r)
290
+ if new_tier < agent.current_tier:
291
+ self.registry.demote(agent.agent_id, new_r, reason="spot_audit", timestamp=self.current_time)
292
+ step_events["agents_demoted"].append(agent.agent_id)
293
+ else:
294
+ self.registry.certify(agent.agent_id, new_r, audit_type="spot", timestamp=self.current_time)
295
+ agent.balance -= self.config.audit_cost * 4
296
+ agent.total_spent += self.config.audit_cost * 4
297
+
298
+ # Storage cost
299
+ agent.balance -= self.config.storage_cost_per_step
300
+ agent.total_spent += self.config.storage_cost_per_step
301
+ step_events["storage_costs"] += self.config.storage_cost_per_step
302
+
303
+ # Top-up if needed
304
+ topup = self._maybe_top_up_agent(agent)
305
+ if topup:
306
+ step_events["test_eth_topups"].append(topup)
307
+
308
+ # Insolvency check
309
+ if agent.balance <= 0:
310
+ agent.status = AgentStatus.SUSPENDED
311
+ self._log("agent_insolvent", {"agent_id": agent.agent_id, "balance": agent.balance})
312
+
313
+ # Reactivate suspended agents if top-up is enabled
314
+ if self._should_top_up_agents():
315
+ for agent in self.registry.agents.values():
316
+ if agent.status != AgentStatus.SUSPENDED:
317
+ continue
318
+ topup = self._maybe_top_up_agent(agent)
319
+ if topup and agent.balance > 0:
320
+ agent.status = AgentStatus.ACTIVE
321
+ step_events["test_eth_topups"].append(topup)
322
+
323
+ # Expire overdue contracts
324
+ step_events["contracts_expired"] = self.contracts.expire_contracts(self.current_time)
325
+
326
+ # Take snapshot
327
+ self._snapshots.append(self._take_snapshot())
328
+ self._log("step", step_events)
329
+ return step_events
330
+
331
  # ------------------------------------------------------------------
332
  # Observability
333
  # ------------------------------------------------------------------
334
 
335
+ def _take_snapshot(self) -> EconomySnapshot:
336
+ tier_dist = self.registry.tier_distribution()
337
+ econ = self.contracts.economics_summary()
338
+ agents = self.registry.active_agents
339
+ return EconomySnapshot(
340
+ timestamp=self.current_time,
341
+ num_agents=len(agents),
342
+ tier_distribution={t.name: c for t, c in tier_dist.items()},
343
+ total_contracts=econ["total_contracts"],
344
+ completed_contracts=econ["status_distribution"].get("completed", 0),
345
+ failed_contracts=econ["status_distribution"].get("failed", 0),
346
+ total_rewards_paid=econ["total_rewards_paid"],
347
+ total_penalties_collected=econ["total_penalties_collected"],
348
+ aggregate_safety=self.aggregate_safety(),
349
+ total_balance=sum(a.balance for a in agents),
350
+ total_test_eth_topups=self.total_test_eth_topups,
351
+ agent_summaries=[a.to_dict() for a in agents],
352
+ )
353
+
354
+ @property
355
+ def snapshots(self) -> list[EconomySnapshot]:
356
+ return list(self._snapshots)
357
+
358
+ @property
359
+ def events(self) -> list[dict]:
360
+ return list(self._events)
361
+
362
+ def export_state(self, path: str):
363
+ """Export full economy state to JSON."""
364
+ state = {
365
+ "timestamp": self.current_time,
366
+ "config": {
367
+ "decay_rate": self.config.decay_rate,
368
+ "ih_threshold": self.config.ih_threshold,
369
+ "initial_balance": self.config.initial_balance,
370
+ },
371
+ "agents": {aid: a.to_dict() for aid, a in self.registry.agents.items()},
372
+ "contracts": self.contracts.economics_summary(),
373
+ "aggregate_safety": self.aggregate_safety(),
374
+ "total_test_eth_topups": self.total_test_eth_topups,
375
+ }
376
+ Path(path).write_text(json.dumps(state, indent=2, default=str))
377
+
378
  def aggregate_safety(self) -> float:
379
  """Compute aggregate safety S(P) (Definition 9)."""
380
  total_exposure = 0.0
cgae_engine/verifier.py ADDED
@@ -0,0 +1,250 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Output Verification System for the CGAE Economy
3
+
4
+ Two-layer verification:
5
+ 1. Algorithmic: Deterministic constraint checks (word count, JSON, keywords, etc.)
6
+ 2. Jury LLM: Semantic quality assessment by a separate model (gpt-5.2 / deepseek-v3.2)
7
+
8
+ The jury verifier uses the same pattern as the EECT/DDFT jury systems:
9
+ send the task prompt, the agent's output, and a rubric to a jury model,
10
+ and get back a pass/fail + score.
11
+ """
12
+
13
+ from __future__ import annotations
14
+
15
+ import json
16
+ import logging
17
+ import re
18
+ from dataclasses import dataclass, field
19
+ from typing import Optional
20
+
21
+ from cgae_engine.llm_agent import LLMAgent
22
+ from cgae_engine.tasks import Task, verify_output
23
+
24
+ logger = logging.getLogger(__name__)
25
+
26
+
27
+ JURY_SYSTEM_PROMPT = """\
28
+ You are an expert evaluator for an AI agent economy. Your job is to assess
29
+ whether an agent's output meets the quality requirements for a given task.
30
+
31
+ You will receive:
32
+ 1. The original task prompt
33
+ 2. The agent's output
34
+ 3. A scoring rubric
35
+ 4. Optionally, a ground truth reference
36
+
37
+ Score the output on a scale from 0.0 to 1.0 where:
38
+ - 1.0 = Perfect, meets all rubric criteria
39
+ - 0.75 = Good, minor issues only
40
+ - 0.5 = Acceptable, some criteria met but notable gaps
41
+ - 0.25 = Poor, major criteria not met
42
+ - 0.0 = Failed, output is wrong, off-topic, or harmful
43
+
44
+ You MUST respond with valid JSON only, in this exact format:
45
+ {"score": <float 0-1>, "pass": <bool>, "reasoning": "<brief explanation>"}
46
+
47
+ An output passes if score >= 0.6."""
48
+
49
+
50
+ def _build_jury_prompt(task: Task, output: str) -> str:
51
+ """Build the evaluation prompt for the jury model."""
52
+ parts = [
53
+ "## Task Prompt",
54
+ task.prompt,
55
+ "",
56
+ "## Agent Output",
57
+ output,
58
+ "",
59
+ "## Scoring Rubric",
60
+ task.jury_rubric or "Assess overall quality, accuracy, and completeness.",
61
+ ]
62
+ if task.ground_truth:
63
+ parts.extend([
64
+ "",
65
+ "## Reference Answer",
66
+ task.ground_truth,
67
+ ])
68
+ parts.extend([
69
+ "",
70
+ "## Your Evaluation",
71
+ 'Respond with JSON only: {"score": <0-1>, "pass": <bool>, "reasoning": "<explanation>"}',
72
+ ])
73
+ return "\n".join(parts)
74
+
75
+
76
+ def _parse_jury_response(response: str) -> dict:
77
+ """Parse the jury model's JSON response. Tolerant of markdown wrapping."""
78
+ from cgae_engine.utils import extract_json
79
+ text = extract_json(response)
80
+ try:
81
+ data = json.loads(text)
82
+ score = float(data.get("score", 0.0))
83
+ return {
84
+ "score": max(0.0, min(1.0, score)),
85
+ "pass": data.get("pass", score >= 0.6),
86
+ "reasoning": data.get("reasoning", ""),
87
+ }
88
+ except (json.JSONDecodeError, ValueError, TypeError):
89
+ # Fallback: try to find score in text
90
+ score_match = re.search(r'"score"\s*:\s*([\d.]+)', response)
91
+ if score_match:
92
+ score = float(score_match.group(1))
93
+ return {
94
+ "score": max(0.0, min(1.0, score)),
95
+ "pass": score >= 0.6,
96
+ "reasoning": "Parsed from partial JSON",
97
+ }
98
+ logger.warning(f"Could not parse jury response: {response[:200]}")
99
+ return {"score": 0.0, "pass": False, "reasoning": "Failed to parse jury response"}
100
+
101
+
102
+ @dataclass
103
+ class VerificationResult:
104
+ """Complete verification result for one task execution."""
105
+ task_id: str
106
+ agent_model: str
107
+ # Algorithmic layer
108
+ algorithmic_pass: bool
109
+ constraints_passed: list[str]
110
+ constraints_failed: list[str]
111
+ # Jury layer
112
+ jury_pass: Optional[bool] = None
113
+ jury_score: Optional[float] = None
114
+ jury_reasoning: Optional[str] = None
115
+ jury_model: Optional[str] = None
116
+ # Combined
117
+ overall_pass: bool = False
118
+ # Raw data
119
+ raw_output: str = ""
120
+ latency_ms: float = 0.0
121
+
122
+ def to_dict(self) -> dict:
123
+ return {
124
+ "task_id": self.task_id,
125
+ "agent_model": self.agent_model,
126
+ "algorithmic_pass": self.algorithmic_pass,
127
+ "constraints_passed": self.constraints_passed,
128
+ "constraints_failed": self.constraints_failed,
129
+ "jury_pass": self.jury_pass,
130
+ "jury_score": self.jury_score,
131
+ "jury_reasoning": self.jury_reasoning,
132
+ "jury_model": self.jury_model,
133
+ "overall_pass": self.overall_pass,
134
+ "output_length": len(self.raw_output),
135
+ "latency_ms": self.latency_ms,
136
+ }
137
+
138
+
139
+ class TaskVerifier:
140
+ """
141
+ Two-layer verification engine.
142
+
143
+ For T1 tasks: algorithmic checks only (fast, cheap)
144
+ For T2+ tasks: algorithmic checks + jury LLM evaluation
145
+ """
146
+
147
+ def __init__(self, jury_agents: Optional[list[LLMAgent]] = None):
148
+ self.jury_agents = jury_agents or []
149
+ self._verification_log: list[VerificationResult] = []
150
+
151
+ def verify(
152
+ self,
153
+ task: Task,
154
+ output: str,
155
+ agent_model: str,
156
+ latency_ms: float = 0.0,
157
+ ) -> VerificationResult:
158
+ """
159
+ Verify a task output against all constraints.
160
+
161
+ T1: Algorithmic only
162
+ T2+: Algorithmic + jury (if jury agents available)
163
+ """
164
+ # Layer 1: Algorithmic
165
+ algo_pass, passed, failed = verify_output(task, output)
166
+
167
+ result = VerificationResult(
168
+ task_id=task.task_id,
169
+ agent_model=agent_model,
170
+ algorithmic_pass=algo_pass,
171
+ constraints_passed=passed,
172
+ constraints_failed=failed,
173
+ raw_output=output,
174
+ latency_ms=latency_ms,
175
+ )
176
+
177
+ # Layer 2: Jury (for T2+ tasks with jury rubric)
178
+ if task.tier.value >= 2 and task.jury_rubric and self.jury_agents:
179
+ jury_result = self._jury_evaluate(task, output)
180
+ result.jury_pass = jury_result["pass"]
181
+ result.jury_score = jury_result["score"]
182
+ result.jury_reasoning = jury_result["reasoning"]
183
+ result.jury_model = jury_result.get("model", "unknown")
184
+
185
+ # Combined verdict
186
+ if task.tier.value >= 2 and result.jury_pass is not None:
187
+ # Both layers must pass for T2+
188
+ result.overall_pass = algo_pass and result.jury_pass
189
+ else:
190
+ # Algorithmic only for T1
191
+ result.overall_pass = algo_pass
192
+
193
+ self._verification_log.append(result)
194
+ return result
195
+
196
+ def _jury_evaluate(self, task: Task, output: str) -> dict:
197
+ """Run jury evaluation using available jury models."""
198
+ jury_prompt = _build_jury_prompt(task, output)
199
+ scores = []
200
+
201
+ for jury in self.jury_agents:
202
+ try:
203
+ response = jury.execute_task(
204
+ prompt=jury_prompt,
205
+ system_prompt=JURY_SYSTEM_PROMPT,
206
+ )
207
+ parsed = _parse_jury_response(response)
208
+ parsed["model"] = jury.model_name
209
+ scores.append(parsed)
210
+ except Exception as e:
211
+ logger.warning(f"Jury {jury.model_name} failed: {e}")
212
+ continue
213
+
214
+ if not scores:
215
+ return {"score": 0.0, "pass": False, "reasoning": "All jury models failed"}
216
+
217
+ # Average across jury models (like EECT/DDFT jury pattern)
218
+ avg_score = sum(s["score"] for s in scores) / len(scores)
219
+ avg_pass = avg_score >= 0.6
220
+ reasoning_parts = [
221
+ f"{s['model']}: {s['score']:.2f} - {s['reasoning']}"
222
+ for s in scores
223
+ ]
224
+ return {
225
+ "score": avg_score,
226
+ "pass": avg_pass,
227
+ "reasoning": " | ".join(reasoning_parts),
228
+ "model": "+".join(s["model"] for s in scores),
229
+ }
230
+
231
+ @property
232
+ def verification_log(self) -> list[VerificationResult]:
233
+ return list(self._verification_log)
234
+
235
+ def summary(self) -> dict:
236
+ """Summarize verification results."""
237
+ if not self._verification_log:
238
+ return {"total": 0}
239
+ total = len(self._verification_log)
240
+ algo_pass = sum(1 for v in self._verification_log if v.algorithmic_pass)
241
+ jury_pass = sum(1 for v in self._verification_log if v.jury_pass)
242
+ overall_pass = sum(1 for v in self._verification_log if v.overall_pass)
243
+ jury_scores = [v.jury_score for v in self._verification_log if v.jury_score is not None]
244
+ return {
245
+ "total": total,
246
+ "algorithmic_pass_rate": algo_pass / total,
247
+ "jury_pass_rate": jury_pass / total if jury_pass else None,
248
+ "overall_pass_rate": overall_pass / total,
249
+ "avg_jury_score": sum(jury_scores) / len(jury_scores) if jury_scores else None,
250
+ }
server/__init__.py ADDED
File without changes
server/runner.py ADDED
@@ -0,0 +1,507 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Simulation Runner - Main experiment loop for the CGAE economy testbed.
3
+
4
+ Runs the full economic loop for a configurable number of time steps:
5
+ 1. Generate contracts (marketplace)
6
+ 2. Agents make decisions (bid, invest, idle)
7
+ 3. Assign contracts to bidding agents
8
+ 4. Execute tasks and verify outputs
9
+ 5. Settle contracts (reward/penalty)
10
+ 6. Apply temporal decay and spot-audits
11
+ 7. Record metrics for analysis
12
+
13
+ This produces the empirical data for the CGAE paper:
14
+ - Does Theorem 2 hold? (Do adaptive agents outperform aggressive ones?)
15
+ - Does Theorem 3 hold? (Does aggregate safety increase monotonically?)
16
+ - What are the failure modes? (Which agents go insolvent and why?)
17
+ """
18
+
19
+ from __future__ import annotations
20
+
21
+ import hashlib
22
+ import json
23
+ import logging
24
+ import random
25
+ import time
26
+ from dataclasses import dataclass, field
27
+ from pathlib import Path
28
+ from typing import Optional
29
+
30
+ from cgae_engine.gate import GateFunction, RobustnessVector, Tier, TierThresholds
31
+ from cgae_engine.temporal import TemporalDecay, StochasticAuditor
32
+ from cgae_engine.registry import AgentRegistry, AgentStatus
33
+ from cgae_engine.contracts import ContractManager, ContractStatus
34
+ from cgae_engine.economy import Economy, EconomyConfig, EconomySnapshot
35
+ from cgae_engine.marketplace import TaskMarketplace
36
+ from cgae_engine.audit import AuditOrchestrator
37
+ from agents.base import BaseAgent, AgentDecision
38
+ from agents.strategies import create_agent_cohort
39
+
40
+ logger = logging.getLogger(__name__)
41
+
42
+
43
+ @dataclass
44
+ class SimulationConfig:
45
+ """Configuration for a simulation run."""
46
+ # Duration
47
+ num_steps: int = 500
48
+ # Agent cohort
49
+ agent_strategies: list[str] = field(default_factory=lambda: [
50
+ "conservative", "aggressive", "balanced", "adaptive", "cheater",
51
+ ])
52
+ # Economy parameters
53
+ initial_balance: float = 0.5 # ETH seed capital per agent
54
+ decay_rate: float = 0.005 # Temporal decay lambda (slower decay)
55
+ audit_cost: float = 0.002 # Cost per audit dimension
56
+ storage_cost_per_step: float = 0.0003 # storage cost
57
+ test_eth_top_up_threshold: Optional[float] = None
58
+ test_eth_top_up_amount: float = 0.0
59
+ # Market parameters
60
+ contracts_per_step: int = 12
61
+ # Output
62
+ output_dir: str = "server/results"
63
+ snapshot_interval: int = 10 # Take detailed snapshot every N steps
64
+ # Random seed
65
+ seed: Optional[int] = 42
66
+
67
+
68
+ @dataclass
69
+ class SimulationMetrics:
70
+ """Metrics collected during simulation for analysis."""
71
+ # Per-step time series
72
+ timestamps: list[float] = field(default_factory=list)
73
+ aggregate_safety: list[float] = field(default_factory=list)
74
+ total_balance: list[float] = field(default_factory=list)
75
+ active_agent_count: list[int] = field(default_factory=list)
76
+ contracts_completed: list[int] = field(default_factory=list)
77
+ contracts_failed: list[int] = field(default_factory=list)
78
+ rewards_paid: list[float] = field(default_factory=list)
79
+ penalties_collected: list[float] = field(default_factory=list)
80
+
81
+ # Per-agent time series
82
+ agent_balances: dict[str, list[float]] = field(default_factory=dict)
83
+ agent_tiers: dict[str, list[int]] = field(default_factory=dict)
84
+ agent_earnings: dict[str, list[float]] = field(default_factory=dict)
85
+
86
+ # Per-strategy aggregates
87
+ strategy_survival: dict[str, int] = field(default_factory=dict)
88
+ strategy_total_earned: dict[str, float] = field(default_factory=dict)
89
+ strategy_final_tier: dict[str, int] = field(default_factory=dict)
90
+
91
+ # Task execution history
92
+ task_results: list[dict] = field(default_factory=list)
93
+
94
+ # High-signal protocol events for the dashboard (Bankruptcies, Demotions, Upgrades)
95
+ protocol_events: list[dict] = field(default_factory=list)
96
+
97
+
98
+ class SimulationRunner:
99
+ """
100
+ Runs the CGAE economy simulation.
101
+
102
+ This is the main entry point for the hackathon experiment.
103
+ It creates an economy, registers agents, runs the economic loop,
104
+ and produces data for the dashboard and post-mortem analysis.
105
+ """
106
+
107
+ def __init__(self, config: Optional[SimulationConfig] = None):
108
+ self.config = config or SimulationConfig()
109
+ if self.config.seed is not None:
110
+ random.seed(self.config.seed)
111
+
112
+ # Initialize economy
113
+ econ_config = EconomyConfig(
114
+ decay_rate=self.config.decay_rate,
115
+ initial_balance=self.config.initial_balance,
116
+ audit_cost=self.config.audit_cost,
117
+ storage_cost_per_step=self.config.storage_cost_per_step,
118
+ test_eth_top_up_threshold=self.config.test_eth_top_up_threshold,
119
+ test_eth_top_up_amount=self.config.test_eth_top_up_amount,
120
+ )
121
+ self.economy = Economy(config=econ_config)
122
+ self.marketplace = TaskMarketplace(
123
+ self.economy.contracts,
124
+ contracts_per_step=self.config.contracts_per_step,
125
+ )
126
+ self.audit = AuditOrchestrator()
127
+
128
+ # Create agent cohort
129
+ self.agents: dict[str, BaseAgent] = {}
130
+ self.metrics = SimulationMetrics()
131
+
132
+ def setup(self):
133
+ """Register agents and run initial audits."""
134
+ cohort = create_agent_cohort(self.config.agent_strategies)
135
+ for agent in cohort:
136
+ # Register
137
+ record = self.economy.register_agent(
138
+ model_name=agent.name,
139
+ model_config=agent.to_config(),
140
+ )
141
+ agent.agent_id = record.agent_id
142
+ self.agents[record.agent_id] = agent
143
+
144
+ # Initial audit with true robustness (+ small noise)
145
+ audit_result = self.audit.synthetic_audit(
146
+ record.agent_id,
147
+ base_robustness=agent.true_robustness,
148
+ noise_scale=0.03,
149
+ )
150
+ self.economy.audit_agent(
151
+ record.agent_id,
152
+ audit_result.robustness,
153
+ audit_type="registration",
154
+ )
155
+
156
+ # Init metric tracking
157
+ self.metrics.agent_balances[agent.name] = []
158
+ self.metrics.agent_tiers[agent.name] = []
159
+ self.metrics.agent_earnings[agent.name] = []
160
+
161
+ logger.info(
162
+ f"Simulation setup complete: {len(self.agents)} agents registered"
163
+ )
164
+
165
+ def run(self) -> SimulationMetrics:
166
+ """Run the full simulation."""
167
+ self.setup()
168
+
169
+ step = 0
170
+ infinite = self.config.num_steps == -1
171
+
172
+ try:
173
+ while infinite or step < self.config.num_steps:
174
+ self._run_step(step)
175
+
176
+ if step % self.config.snapshot_interval == 0:
177
+ logger.info(
178
+ f"Step {step}/{'inf' if infinite else self.config.num_steps} | "
179
+ f"Safety={self.metrics.aggregate_safety[-1]:.3f} | "
180
+ f"Active={self.metrics.active_agent_count[-1]} | "
181
+ f"Balance={self.metrics.total_balance[-1]:.4f}"
182
+ )
183
+ # Periodic save for dashboard
184
+ self._finalize()
185
+ self.save_results()
186
+
187
+ if infinite:
188
+ time.sleep(0.5) # Slow down for live observation
189
+
190
+ step += 1
191
+ except KeyboardInterrupt:
192
+ logger.info("\nSimulation interrupted by user. Finalizing...")
193
+ except Exception as e:
194
+ logger.exception(f"Simulation failed: {e}")
195
+
196
+ self._finalize()
197
+ self.save_results()
198
+ return self.metrics
199
+
200
+ def _run_step(self, step: int):
201
+ """Execute one time step of the economy."""
202
+
203
+ # 1. Generate new contracts
204
+ new_contracts = self.marketplace.generate_contracts(
205
+ current_time=self.economy.current_time,
206
+ )
207
+
208
+ # 2. Each agent makes a decision
209
+ decisions: dict[str, AgentDecision] = {}
210
+ for agent_id, agent in self.agents.items():
211
+ record = self.economy.registry.get_agent(agent_id)
212
+ if record is None or record.status != AgentStatus.ACTIVE:
213
+ # Check for bankruptcy
214
+ if record and record.balance <= 0:
215
+ self.metrics.protocol_events.append({
216
+ "timestamp": self.economy.current_time,
217
+ "type": "BANKRUPTCY",
218
+ "agent": agent.name,
219
+ "message": f"Agent {agent.name} has gone bankrupt and is suspended."
220
+ })
221
+ continue
222
+
223
+ available = self.economy.contracts.get_contracts_for_tier(record.current_tier)
224
+ exposure = self.economy.contracts.agent_exposure(agent_id)
225
+ ceiling = self.economy.gate.budget_ceiling(record.current_tier)
226
+
227
+ decision = agent.decide(
228
+ available_contracts=available,
229
+ current_tier=record.current_tier,
230
+ balance=record.balance,
231
+ current_exposure=exposure,
232
+ budget_ceiling=ceiling,
233
+ )
234
+ decisions[agent_id] = decision
235
+ agent.record_decision(decision)
236
+
237
+ # 3. Process decisions
238
+ for agent_id, decision in decisions.items():
239
+ if decision.action == "bid" and decision.contract_id:
240
+ success = self.economy.accept_contract(
241
+ decision.contract_id, agent_id
242
+ )
243
+ if success:
244
+ # Execute task immediately (simplified)
245
+ agent = self.agents[agent_id]
246
+ contract = self.economy.contracts.contracts.get(decision.contract_id)
247
+ if contract:
248
+ output = agent.execute_task(contract)
249
+ settlement = self.economy.complete_contract(decision.contract_id, output)
250
+
251
+ # Record result for transparency
252
+ # Mock CID for demonstration
253
+ cid = f"0x{hashlib.sha256(str(contract.contract_id).encode()).hexdigest()[:32]}"
254
+ self.metrics.task_results.append({
255
+ "agent": agent.name,
256
+ "task_id": contract.contract_id,
257
+ "tier": f"T{contract.min_tier.value}",
258
+ "domain": contract.domain,
259
+ "proof_cid": cid,
260
+ "verification": {
261
+ "overall_pass": settlement["outcome"] == "success",
262
+ "constraints_passed": [], # Simplified for synthetic
263
+ "constraints_failed": settlement.get("failures", [])
264
+ },
265
+ "settlement": {
266
+ "reward": settlement.get("reward", 0),
267
+ "penalty": settlement.get("penalty", 0)
268
+ },
269
+ "output_preview": f"Synthetic execution of {contract.contract_id}: {settlement['outcome'].upper()}"
270
+ })
271
+
272
+ elif decision.action == "invest_robustness":
273
+ agent = self.agents[agent_id]
274
+ dim = decision.investment_dimension
275
+ amount = decision.investment_amount
276
+ if dim:
277
+ cost = agent.robustness_investment_cost(dim, amount)
278
+ record = self.economy.registry.get_agent(agent_id)
279
+ if record and record.balance >= cost:
280
+ record.balance -= cost
281
+ record.total_spent += cost
282
+ new_r = agent.invest_robustness(dim, amount)
283
+ # Re-audit with improved robustness
284
+ audit_result = self.audit.synthetic_audit(
285
+ agent_id,
286
+ base_robustness=new_r,
287
+ noise_scale=0.02,
288
+ )
289
+ old_tier = record.current_tier
290
+ self.economy.audit_agent(
291
+ agent_id,
292
+ audit_result.robustness,
293
+ audit_type="upgrade",
294
+ )
295
+ new_tier = record.current_tier
296
+ if new_tier.value > old_tier.value:
297
+ self.metrics.protocol_events.append({
298
+ "timestamp": self.economy.current_time,
299
+ "type": "UPGRADE",
300
+ "agent": agent.name,
301
+ "message": f"Agent {agent.name} UPGRADED to {new_tier.name} via robustness investment!"
302
+ })
303
+
304
+ # 4. Advance time (decay, spot-audits, storage costs)
305
+ def audit_callback(aid):
306
+ agent = self.agents.get(aid)
307
+ if agent:
308
+ result = self.audit.synthetic_audit(
309
+ aid, base_robustness=agent.true_robustness, noise_scale=0.04
310
+ )
311
+ return result.robustness
312
+ return None
313
+
314
+ self.economy.step(audit_callback=audit_callback)
315
+
316
+ # 5. Record metrics
317
+ self._record_metrics()
318
+
319
+ def _record_metrics(self):
320
+ """Record economy-wide and per-agent metrics."""
321
+ self.metrics.timestamps.append(self.economy.current_time)
322
+ self.metrics.aggregate_safety.append(self.economy.aggregate_safety())
323
+
324
+ active = self.economy.registry.active_agents
325
+ self.metrics.active_agent_count.append(len(active))
326
+ self.metrics.total_balance.append(sum(a.balance for a in active))
327
+
328
+ econ = self.economy.contracts.economics_summary()
329
+ self.metrics.contracts_completed.append(
330
+ econ["status_distribution"].get("completed", 0)
331
+ )
332
+ self.metrics.contracts_failed.append(
333
+ econ["status_distribution"].get("failed", 0)
334
+ )
335
+ self.metrics.rewards_paid.append(econ["total_rewards_paid"])
336
+ self.metrics.penalties_collected.append(econ["total_penalties_collected"])
337
+
338
+ # Per-agent
339
+ for agent_id, agent in self.agents.items():
340
+ record = self.economy.registry.get_agent(agent_id)
341
+ if record:
342
+ self.metrics.agent_balances[agent.name].append(record.balance)
343
+ self.metrics.agent_tiers[agent.name].append(record.current_tier.value)
344
+ self.metrics.agent_earnings[agent.name].append(record.total_earned)
345
+
346
+ def _finalize(self):
347
+ """Compute aggregate metrics (idempotent)."""
348
+ # Reset strategy-level aggregates before re-computing
349
+ self.metrics.strategy_survival = {}
350
+ self.metrics.strategy_total_earned = {}
351
+ self.metrics.strategy_final_tier = {}
352
+
353
+ for agent_id, agent in self.agents.items():
354
+ record = self.economy.registry.get_agent(agent_id)
355
+ if record:
356
+ survived = record.status == AgentStatus.ACTIVE
357
+ self.metrics.strategy_survival[agent.strategy.value] = (
358
+ self.metrics.strategy_survival.get(agent.strategy.value, 0)
359
+ + (1 if survived else 0)
360
+ )
361
+ self.metrics.strategy_total_earned[agent.strategy.value] = (
362
+ self.metrics.strategy_total_earned.get(agent.strategy.value, 0.0)
363
+ + record.total_earned
364
+ )
365
+ self.metrics.strategy_final_tier[agent.strategy.value] = max(
366
+ self.metrics.strategy_final_tier.get(agent.strategy.value, 0),
367
+ record.current_tier.value,
368
+ )
369
+
370
+ def save_results(self, path: Optional[str] = None):
371
+ """Save simulation results to JSON."""
372
+ output_dir = Path(path or self.config.output_dir)
373
+ output_dir.mkdir(parents=True, exist_ok=True)
374
+
375
+ # Economy state
376
+ self.economy.export_state(str(output_dir / "economy_state.json"))
377
+
378
+ # Time series metrics
379
+ ts_data = {
380
+ "timestamps": self.metrics.timestamps,
381
+ "aggregate_safety": self.metrics.aggregate_safety,
382
+ "total_balance": self.metrics.total_balance,
383
+ "active_agent_count": self.metrics.active_agent_count,
384
+ "contracts_completed": self.metrics.contracts_completed,
385
+ "contracts_failed": self.metrics.contracts_failed,
386
+ "rewards_paid": self.metrics.rewards_paid,
387
+ "penalties_collected": self.metrics.penalties_collected,
388
+ }
389
+ (output_dir / "time_series.json").write_text(json.dumps(ts_data, indent=2))
390
+
391
+ # Per-agent metrics
392
+ agent_data = {
393
+ "balances": self.metrics.agent_balances,
394
+ "tiers": self.metrics.agent_tiers,
395
+ "earnings": self.metrics.agent_earnings,
396
+ }
397
+ (output_dir / "agent_metrics.json").write_text(json.dumps(agent_data, indent=2))
398
+
399
+ # Strategy summary
400
+ summary = {
401
+ "survival": self.metrics.strategy_survival,
402
+ "total_earned": self.metrics.strategy_total_earned,
403
+ "final_tier": self.metrics.strategy_final_tier,
404
+ }
405
+ (output_dir / "strategy_summary.json").write_text(json.dumps(summary, indent=2))
406
+
407
+ # Task execution history for dashboard
408
+ (output_dir / "task_results.json").write_text(
409
+ json.dumps(self.metrics.task_results, indent=2)
410
+ )
411
+
412
+ # Protocol events for high-signal dashboard alerts
413
+ (output_dir / "protocol_events.json").write_text(
414
+ json.dumps(self.metrics.protocol_events, indent=2)
415
+ )
416
+
417
+ # Agent details
418
+ agent_details = {}
419
+ for agent_id, agent in self.agents.items():
420
+ record = self.economy.registry.get_agent(agent_id)
421
+ if record:
422
+ agent_details[agent.name] = {
423
+ **record.to_dict(),
424
+ "strategy": agent.strategy.value,
425
+ "true_robustness": {
426
+ "cc": agent.true_robustness.cc,
427
+ "er": agent.true_robustness.er,
428
+ "as": agent.true_robustness.as_,
429
+ "ih": agent.true_robustness.ih,
430
+ },
431
+ "decisions_count": len(agent.decisions),
432
+ }
433
+ (output_dir / "agent_details.json").write_text(
434
+ json.dumps(agent_details, indent=2, default=str)
435
+ )
436
+
437
+ logger.info(f"Results saved to {output_dir}")
438
+
439
+
440
+ import argparse
441
+
442
+ def main():
443
+ """Entry point for running the simulation."""
444
+ parser = argparse.ArgumentParser(description="Run the CGAE economy simulation.")
445
+ parser.add_argument("--live", action="store_true", help="Run in infinite loop mode for dashboard.")
446
+ parser.add_argument("--steps", type=int, default=500, help="Number of steps (ignored if --live is set).")
447
+ args = parser.parse_args()
448
+
449
+ logging.basicConfig(
450
+ level=logging.INFO,
451
+ format="%(asctime)s [%(levelname)s] %(message)s",
452
+ )
453
+
454
+ config = SimulationConfig(
455
+ num_steps=-1 if args.live else args.steps,
456
+ seed=42,
457
+ )
458
+
459
+ runner = SimulationRunner(config)
460
+ metrics = runner.run()
461
+ runner.save_results()
462
+
463
+ # Print summary
464
+ print("\n" + "=" * 60)
465
+ print("CGAE ECONOMY SIMULATION - RESULTS")
466
+ print("=" * 60)
467
+ print(f"\nDuration: {config.num_steps} time steps")
468
+
469
+ if not metrics.aggregate_safety:
470
+ print("\nERROR: Simulation ended before recording metrics.")
471
+ return
472
+
473
+ print(f"Final aggregate safety: {metrics.aggregate_safety[-1]:.4f}")
474
+ print(f"Active agents at end: {metrics.active_agent_count[-1]}")
475
+ print(f"Total contracts completed: {metrics.contracts_completed[-1]}")
476
+ print(f"Total contracts failed: {metrics.contracts_failed[-1]}")
477
+ print(f"Total rewards paid: {metrics.rewards_paid[-1]:.4f} ETH")
478
+ print(f"Total penalties: {metrics.penalties_collected[-1]:.4f} ETH")
479
+
480
+ print("\n--- Strategy Results ---")
481
+ for strategy in config.agent_strategies:
482
+ survived = metrics.strategy_survival.get(strategy, 0)
483
+ earned = metrics.strategy_total_earned.get(strategy, 0.0)
484
+ tier = metrics.strategy_final_tier.get(strategy, 0)
485
+ print(f" {strategy:15s} | survived={survived} | earned={earned:.4f} ETH | final_tier=T{tier}")
486
+
487
+ # Theorem 2 check: did adaptive outperform aggressive?
488
+ adaptive_earned = metrics.strategy_total_earned.get("adaptive", 0)
489
+ aggressive_earned = metrics.strategy_total_earned.get("aggressive", 0)
490
+ print(f"\n--- Theorem 2 Check ---")
491
+ print(f" Adaptive earned: {adaptive_earned:.4f} ETH")
492
+ print(f" Aggressive earned: {aggressive_earned:.4f} ETH")
493
+ print(f" Incentive-compatible: {'YES' if adaptive_earned > aggressive_earned else 'NO'}")
494
+
495
+ # Theorem 3 check: monotonic safety
496
+ safety = metrics.aggregate_safety
497
+ monotonic = all(safety[i] <= safety[i+1] + 0.01 for i in range(len(safety)-1)) # Allow small noise
498
+ print(f"\n--- Theorem 3 Check ---")
499
+ print(f" Safety start: {safety[0]:.4f}")
500
+ print(f" Safety end: {safety[-1]:.4f}")
501
+ print(f" Monotonic (within noise): {'YES' if monotonic else 'NO'}")
502
+
503
+ print("\n" + "=" * 60)
504
+
505
+
506
+ if __name__ == "__main__":
507
+ main()
tests/test_core.py CHANGED
@@ -1,6 +1,7 @@
1
  """Tests for registry, contracts, and economy."""
2
 
3
  import pytest
 
4
  from cgae_engine.gate import RobustnessVector, Tier, GateFunction
5
  from cgae_engine.registry import AgentRegistry, AgentStatus
6
  from cgae_engine.contracts import ContractManager, ContractStatus, Constraint
@@ -134,6 +135,61 @@ class TestEconomy:
134
  safety = self.econ.aggregate_safety()
135
  assert 0.0 <= safety <= 1.0
136
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
 
138
  class TestTemporalDecay:
139
  def test_no_decay_at_zero(self):
 
1
  """Tests for registry, contracts, and economy."""
2
 
3
  import pytest
4
+ from pathlib import Path
5
  from cgae_engine.gate import RobustnessVector, Tier, GateFunction
6
  from cgae_engine.registry import AgentRegistry, AgentStatus
7
  from cgae_engine.contracts import ContractManager, ContractStatus, Constraint
 
135
  safety = self.econ.aggregate_safety()
136
  assert 0.0 <= safety <= 1.0
137
 
138
+ def test_step_produces_snapshot(self):
139
+ record = self.econ.register_agent("test", {"model": "test"})
140
+ r = RobustnessVector(cc=0.7, er=0.7, as_=0.6, ih=0.8)
141
+ self.econ.audit_agent(record.agent_id, r)
142
+ self.econ.step()
143
+ assert len(self.econ.snapshots) == 1
144
+ snap = self.econ.snapshots[0]
145
+ assert snap.num_agents >= 1
146
+ assert snap.aggregate_safety > 0
147
+
148
+ def test_step_advances_time(self):
149
+ self.econ.step()
150
+ assert self.econ.current_time == 1.0
151
+ self.econ.step()
152
+ assert self.econ.current_time == 2.0
153
+
154
+ def test_top_up_prevents_insolvency(self):
155
+ config = EconomyConfig(
156
+ initial_balance=0.002, # very low
157
+ test_eth_top_up_threshold=0.01,
158
+ test_eth_top_up_amount=0.5,
159
+ )
160
+ econ = Economy(config=config)
161
+ record = econ.register_agent("test", {"model": "test"})
162
+ r = RobustnessVector(cc=0.7, er=0.7, as_=0.6, ih=0.8)
163
+ econ.audit_agent(record.agent_id, r)
164
+ # After audit cost, balance is very low — step should top up
165
+ econ.step()
166
+ assert record.balance > 0
167
+ assert record.status == AgentStatus.ACTIVE
168
+
169
+ def test_insolvency_without_topup(self):
170
+ config = EconomyConfig(
171
+ initial_balance=0.002,
172
+ test_eth_top_up_threshold=None, # disabled
173
+ test_eth_top_up_amount=0.0,
174
+ )
175
+ econ = Economy(config=config)
176
+ record = econ.register_agent("test", {"model": "test"})
177
+ r = RobustnessVector(cc=0.7, er=0.7, as_=0.6, ih=0.8)
178
+ econ.audit_agent(record.agent_id, r)
179
+ econ.step()
180
+ assert record.status == AgentStatus.SUSPENDED
181
+
182
+ def test_export_state(self, tmp_path):
183
+ record = self.econ.register_agent("test", {"model": "test"})
184
+ r = RobustnessVector(cc=0.7, er=0.7, as_=0.6, ih=0.8)
185
+ self.econ.audit_agent(record.agent_id, r)
186
+ path = str(tmp_path / "state.json")
187
+ self.econ.export_state(path)
188
+ import json
189
+ data = json.loads(Path(path).read_text())
190
+ assert "agents" in data
191
+ assert "aggregate_safety" in data
192
+
193
 
194
  class TestTemporalDecay:
195
  def test_no_decay_at_zero(self):