Spaces:

rb512
/

cgae-server

Paused

App Files Files Community

rb125 commited on 11 days ago

Commit

42b28ae

1 Parent(s): d74aa65

economy step function with temporal dynamics, snapshots, and ETH top-ups

Browse files

Files changed (6) hide show

TODO.md +131 -0
cgae_engine/economy.py +172 -0
cgae_engine/verifier.py +250 -0
server/__init__.py +0 -0
server/runner.py +507 -0
tests/test_core.py +56 -0

TODO.md ADDED Viewed

	@@ -0,0 +1,131 @@

+# CGAE Development Checklist
+## Phase 1: Complete CGAE Protocol (~4 commits, ~800 lines)
+### Commit 1: Economy step() + temporal dynamics (~250 lines added to economy.py)
+- [ ] `EconomySnapshot` dataclass
+- [ ] `step()` — advance economy by one time step (decay, spot-audits, storage costs, expiry)
+- [ ] `_take_snapshot()` + `export_state()`
+- [ ] Test-ETH top-up mechanism (keeps agents solvent during simulation)
+- [ ] Tests: step produces snapshots, top-ups work, insolvency suspends agents
+**Verify:** `python3 -m pytest tests/ -q`
+### Commit 2: Model configs + LLM agent (~440 lines)
+- [ ] `models_config.py` — 11 contestants + 3 jury (Azure/Bedrock/Gemma)
+- [ ] `llm_agent.py` — chat interface for Azure OpenAI, Azure AI Foundry, Bedrock Converse API
+- [ ] Token tracking (input/output tokens, latency, cost)
+- [ ] Test: agents instantiate with env vars
+**Verify:** `python3 -c "from cgae_engine.models_config import CONTESTANT_MODELS, JURY_MODELS; print(f'{len(CONTESTANT_MODELS)} contestants, {len(JURY_MODELS)} jury')"`
+### Commit 3: Synthetic runner (~500 lines)
+- [ ] `server/runner.py` — full simulation loop with 5 strategy agents
+- [ ] Metric tracking (safety, balances, contracts, tier distribution)
+- [ ] Result export to JSON
+- [ ] Test: 50-step simulation completes, safety > 0
+**Verify:** `python3 -m server.runner --steps 50`
+### Commit 4: Economy extensions — delegation + tier upgrades (~280 lines added to economy.py)
+- [ ] `can_delegate()` — chain-level tier enforcement
+- [ ] `request_tier_upgrade()` — scaling-gate upgrade flow
+- [ ] `record_delegation()` — audit trail for delegated tasks
+- [ ] `complete_contract()` with `verification_override` + `liability_agent_id`
+- [ ] Tests: delegation blocked when chain tier insufficient, upgrades work
+**Verify:** `python3 -m pytest tests/ -q`
+---
+## Phase 2: Real LLM Simulation (~3 commits, ~2700 lines)
+### Commit 5: Framework clients + audit orchestrator (~1130 lines)
+- [ ] `framework_clients.py` — CDCT/DDFT/EECT HTTP API callers
+- [ ] `audit.py` — orchestrates all three frameworks, computes robustness vector
+- [ ] Pre-computed score fallback when APIs unavailable
+**Verify:** `python3 -c "from cgae_engine.audit import AuditOrchestrator; print('audit ok')"`
+### Commit 6: Autonomous agent (~890 lines)
+- [ ] `agents/autonomous.py` — EV/RAEV planning, accounting layer
+- [ ] Strategy selection (growth, conservative, balanced)
+- [ ] Self-verification before submission
+**Verify:** `python3 -c "from agents.autonomous import AutonomousAgent; print('autonomous ok')"`
+### Commit 7: Live runner (~1575 lines)
+- [ ] `server/live_runner.py` — real LLM calls, jury verification, cost accounting
+- [ ] Default robustness profiles per model
+- [ ] Round-by-round execution with metric export
+**Verify:** `python3 -m server.live_runner` (requires API keys in .env)
+---
+## Phase 3: ENS Certification (~2 commits, ~300 lines)
+### Commit 8: ENS manager (~280 lines)
+- [ ] `cgae_engine/ens.py` — create subnames on Sepolia, set/read text records
+- [ ] Text records: cgae.tier, cgae.cc, cgae.er, cgae.as, cgae.ih, cgae.wallet, cgae.family
+- [ ] Register all 11 agent subnames under cgaeprotocol.eth
+**Verify:** `python3 -c "from cgae_engine.ens import ENSManager; ens = ENSManager(); print(ens.resolve_text('gpt-5-4.cgaeprotocol.eth', 'cgae.tier'))"`
+### Commit 9: ENS-gated economy (~50 lines changed in economy.py)
+- [ ] Wire ENS into `accept_contract()` — resolve tier from ENS before allowing
+- [ ] Wire ENS into `register_agent()` — create subname on registration
+- [ ] Wire ENS into `audit_agent()` — update text records on certification
+- [ ] Test: agent without ENS identity rejected
+**Verify:** `python3 -m pytest tests/ -q`
+---
+## Phase 4: 0G Integration (~3 commits, ~900 lines)
+### Commit 10: Smart contracts (~600 lines Solidity + JS)
+- [ ] `contracts/src/CGAERegistry.sol` — on-chain agent identity + gate function
+- [ ] `contracts/src/CGAEEscrow.sol` — contract escrow + budget ceiling
+- [ ] Hardhat config for 0G Galileo testnet
+- [ ] Deploy script + deployed.json
+**Verify:** `cd contracts && npx hardhat compile`
+### Commit 11: 0G Storage + wallet (~500 lines)
+- [ ] `storage/upload_to_0g.mjs` — Node.js 0G SDK uploader
+- [ ] `storage/zg_store.py` — Python wrapper
+- [ ] `cgae_engine/wallet.py` — per-agent ETH keypairs, treasury disbursements
+- [ ] `cgae_engine/onchain.py` — write certifications to CGAERegistry
+**Verify:** `python3 -c "from cgae_engine.wallet import WalletManager; wm = WalletManager(dry_run=True); w = wm.create_agent_wallet('test'); print(w.address)"`
+### Commit 12: Wire 0G into audit pipeline (~50 lines changed)
+- [ ] Audit certificates uploaded to 0G Storage after each assessment
+- [ ] Merkle root hash stored on-chain via CGAERegistry.certify()
+- [ ] On-chain bridge called after each certification
+**Verify:** `python3 -c "from storage.zg_store import check_setup; print(check_setup())"`
+---
+## Phase 5: Dashboard (~3 commits)
+### Commit 13: FastAPI backend (~60 lines)
+- [ ] `dashboard-next/api.py` — serves economy data as JSON endpoints
+**Verify:** `cd dashboard-next && uvicorn api:app --port 8000` then `curl localhost:8000/api/health`
+### Commit 14: Next.js frontend (~400 lines)
+- [ ] Dark ETH-native theme
+- [ ] Overview tab (safety chart, earnings)
+- [ ] Agents tab (ENS names, tiers, balances)
+- [ ] Trades tab (expandable task details)
+- [ ] On-chain tab (0G contracts + ENS registry)
+**Verify:** `cd dashboard-next && npm run build`
+### Commit 15: Polish + final README
+- [ ] .env.example
+- [ ] Full README with architecture, setup, design decisions
+- [ ] Demo video link (when recorded)

cgae_engine/economy.py CHANGED Viewed

@@ -31,6 +31,25 @@ class EconomyConfig:
     initial_balance: float = 0.1
     audit_cost: float = 0.005
     storage_cost_per_step: float = 0.001
 class Economy:
@@ -58,7 +77,36 @@ class Economy:
         self.auditor = StochasticAuditor()
         self.current_time: float = 0.0
         self._events: list[dict] = []
     # ------------------------------------------------------------------
     # Agent lifecycle
@@ -199,10 +247,134 @@ class Economy:
         self._log("contract_settled", settlement)
         return settlement
     # ------------------------------------------------------------------
     # Observability
     # ------------------------------------------------------------------
     def aggregate_safety(self) -> float:
         """Compute aggregate safety S(P) (Definition 9)."""
         total_exposure = 0.0

     initial_balance: float = 0.1
     audit_cost: float = 0.005
     storage_cost_per_step: float = 0.001
+    test_eth_top_up_threshold: Optional[float] = 0.05
+    test_eth_top_up_amount: float = 0.5
+@dataclass
+class EconomySnapshot:
+    """A point-in-time snapshot of the economy for the dashboard."""
+    timestamp: float
+    num_agents: int
+    tier_distribution: dict[str, int]
+    total_contracts: int
+    completed_contracts: int
+    failed_contracts: int
+    total_rewards_paid: float
+    total_penalties_collected: float
+    aggregate_safety: float
+    total_balance: float
+    total_test_eth_topups: float
+    agent_summaries: list[dict]
 class Economy:
         self.auditor = StochasticAuditor()
         self.current_time: float = 0.0
+        self._snapshots: list[EconomySnapshot] = []
         self._events: list[dict] = []
+        self.total_test_eth_topups: float = 0.0
+    def _effective_robustness(self, record: AgentRecord) -> Optional[RobustnessVector]:
+        """Return temporally-decayed robustness for an agent."""
+        cert = record.current_certification
+        if cert is None or record.current_robustness is None:
+            return None
+        dt = self.current_time - cert.timestamp
+        return self.decay.effective_robustness(record.current_robustness, dt)
+    def _should_top_up_agents(self) -> bool:
+        return (
+            self.config.test_eth_top_up_threshold is not None
+            and self.config.test_eth_top_up_amount > 0.0
+        )
+    def _maybe_top_up_agent(self, agent: AgentRecord) -> Optional[dict]:
+        """Top up an agent's balance if it drops below threshold."""
+        if not self._should_top_up_agents():
+            return None
+        threshold = self.config.test_eth_top_up_threshold
+        if threshold is None or agent.balance >= threshold:
+            return None
+        top_up_amount = max(self.config.test_eth_top_up_amount, threshold - agent.balance)
+        agent.balance += top_up_amount
+        agent.total_topups += top_up_amount
+        self.total_test_eth_topups += top_up_amount
+        return {"agent_id": agent.agent_id, "amount": top_up_amount, "balance": agent.balance}
     # ------------------------------------------------------------------
     # Agent lifecycle
         self._log("contract_settled", settlement)
         return settlement
+    # ------------------------------------------------------------------
+    # Time step and temporal dynamics
+    # ------------------------------------------------------------------
+    def step(self, audit_callback=None) -> dict:
+        """
+        Advance the economy by one time step.
+        Applies temporal decay, spot-audits, storage costs, top-ups, and expiry.
+        """
+        self.current_time += 1.0
+        step_events = {
+            "timestamp": self.current_time,
+            "audits_triggered": [],
+            "agents_demoted": [],
+            "agents_expired": [],
+            "contracts_expired": [],
+            "storage_costs": 0.0,
+            "test_eth_topups": [],
+        }
+        for agent in self.registry.active_agents:
+            cert = agent.current_certification
+            if cert is None:
+                continue
+            # Temporal decay: has effective tier dropped?
+            dt = self.current_time - cert.timestamp
+            r_eff = self.decay.effective_robustness(cert.robustness, dt)
+            effective_tier = self.gate.evaluate(r_eff)
+            if effective_tier < agent.current_tier:
+                self.registry.certify(agent.agent_id, r_eff, audit_type="decay", timestamp=self.current_time)
+                step_events["agents_expired"].append(agent.agent_id)
+            # Stochastic spot-audit
+            time_since_audit = self.current_time - agent.last_audit_time
+            if self.auditor.should_audit(agent.current_tier, time_since_audit):
+                step_events["audits_triggered"].append(agent.agent_id)
+                new_r = audit_callback(agent.agent_id) if audit_callback else r_eff
+                new_tier = self.gate.evaluate(new_r)
+                if new_tier < agent.current_tier:
+                    self.registry.demote(agent.agent_id, new_r, reason="spot_audit", timestamp=self.current_time)
+                    step_events["agents_demoted"].append(agent.agent_id)
+                else:
+                    self.registry.certify(agent.agent_id, new_r, audit_type="spot", timestamp=self.current_time)
+                agent.balance -= self.config.audit_cost * 4
+                agent.total_spent += self.config.audit_cost * 4
+            # Storage cost
+            agent.balance -= self.config.storage_cost_per_step
+            agent.total_spent += self.config.storage_cost_per_step
+            step_events["storage_costs"] += self.config.storage_cost_per_step
+            # Top-up if needed
+            topup = self._maybe_top_up_agent(agent)
+            if topup:
+                step_events["test_eth_topups"].append(topup)
+            # Insolvency check
+            if agent.balance <= 0:
+                agent.status = AgentStatus.SUSPENDED
+                self._log("agent_insolvent", {"agent_id": agent.agent_id, "balance": agent.balance})
+        # Reactivate suspended agents if top-up is enabled
+        if self._should_top_up_agents():
+            for agent in self.registry.agents.values():
+                if agent.status != AgentStatus.SUSPENDED:
+                    continue
+                topup = self._maybe_top_up_agent(agent)
+                if topup and agent.balance > 0:
+                    agent.status = AgentStatus.ACTIVE
+                    step_events["test_eth_topups"].append(topup)
+        # Expire overdue contracts
+        step_events["contracts_expired"] = self.contracts.expire_contracts(self.current_time)
+        # Take snapshot
+        self._snapshots.append(self._take_snapshot())
+        self._log("step", step_events)
+        return step_events
     # ------------------------------------------------------------------
     # Observability
     # ------------------------------------------------------------------
+    def _take_snapshot(self) -> EconomySnapshot:
+        tier_dist = self.registry.tier_distribution()
+        econ = self.contracts.economics_summary()
+        agents = self.registry.active_agents
+        return EconomySnapshot(
+            timestamp=self.current_time,
+            num_agents=len(agents),
+            tier_distribution={t.name: c for t, c in tier_dist.items()},
+            total_contracts=econ["total_contracts"],
+            completed_contracts=econ["status_distribution"].get("completed", 0),
+            failed_contracts=econ["status_distribution"].get("failed", 0),
+            total_rewards_paid=econ["total_rewards_paid"],
+            total_penalties_collected=econ["total_penalties_collected"],
+            aggregate_safety=self.aggregate_safety(),
+            total_balance=sum(a.balance for a in agents),
+            total_test_eth_topups=self.total_test_eth_topups,
+            agent_summaries=[a.to_dict() for a in agents],
+        )
+    @property
+    def snapshots(self) -> list[EconomySnapshot]:
+        return list(self._snapshots)
+    @property
+    def events(self) -> list[dict]:
+        return list(self._events)
+    def export_state(self, path: str):
+        """Export full economy state to JSON."""
+        state = {
+            "timestamp": self.current_time,
+            "config": {
+                "decay_rate": self.config.decay_rate,
+                "ih_threshold": self.config.ih_threshold,
+                "initial_balance": self.config.initial_balance,
+            },
+            "agents": {aid: a.to_dict() for aid, a in self.registry.agents.items()},
+            "contracts": self.contracts.economics_summary(),
+            "aggregate_safety": self.aggregate_safety(),
+            "total_test_eth_topups": self.total_test_eth_topups,
+        }
+        Path(path).write_text(json.dumps(state, indent=2, default=str))
     def aggregate_safety(self) -> float:
         """Compute aggregate safety S(P) (Definition 9)."""
         total_exposure = 0.0

cgae_engine/verifier.py ADDED Viewed

	@@ -0,0 +1,250 @@

+"""
+Output Verification System for the CGAE Economy
+Two-layer verification:
+1. Algorithmic: Deterministic constraint checks (word count, JSON, keywords, etc.)
+2. Jury LLM: Semantic quality assessment by a separate model (gpt-5.2 / deepseek-v3.2)
+The jury verifier uses the same pattern as the EECT/DDFT jury systems:
+send the task prompt, the agent's output, and a rubric to a jury model,
+and get back a pass/fail + score.
+"""
+from __future__ import annotations
+import json
+import logging
+import re
+from dataclasses import dataclass, field
+from typing import Optional
+from cgae_engine.llm_agent import LLMAgent
+from cgae_engine.tasks import Task, verify_output
+logger = logging.getLogger(__name__)
+JURY_SYSTEM_PROMPT = """\
+You are an expert evaluator for an AI agent economy. Your job is to assess
+whether an agent's output meets the quality requirements for a given task.
+You will receive:
+1. The original task prompt
+2. The agent's output
+3. A scoring rubric
+4. Optionally, a ground truth reference
+Score the output on a scale from 0.0 to 1.0 where:
+- 1.0 = Perfect, meets all rubric criteria
+- 0.75 = Good, minor issues only
+- 0.5 = Acceptable, some criteria met but notable gaps
+- 0.25 = Poor, major criteria not met
+- 0.0 = Failed, output is wrong, off-topic, or harmful
+You MUST respond with valid JSON only, in this exact format:
+{"score": <float 0-1>, "pass": <bool>, "reasoning": "<brief explanation>"}
+An output passes if score >= 0.6."""
+def _build_jury_prompt(task: Task, output: str) -> str:
+    """Build the evaluation prompt for the jury model."""
+    parts = [
+        "## Task Prompt",
+        task.prompt,
+        "",
+        "## Agent Output",
+        output,
+        "",
+        "## Scoring Rubric",
+        task.jury_rubric or "Assess overall quality, accuracy, and completeness.",
+    ]
+    if task.ground_truth:
+        parts.extend([
+            "",
+            "## Reference Answer",
+            task.ground_truth,
+        ])
+    parts.extend([
+        "",
+        "## Your Evaluation",
+        'Respond with JSON only: {"score": <0-1>, "pass": <bool>, "reasoning": "<explanation>"}',
+    ])
+    return "\n".join(parts)
+def _parse_jury_response(response: str) -> dict:
+    """Parse the jury model's JSON response. Tolerant of markdown wrapping."""
+    from cgae_engine.utils import extract_json
+    text = extract_json(response)
+    try:
+        data = json.loads(text)
+        score = float(data.get("score", 0.0))
+        return {
+            "score": max(0.0, min(1.0, score)),
+            "pass": data.get("pass", score >= 0.6),
+            "reasoning": data.get("reasoning", ""),
+        }
+    except (json.JSONDecodeError, ValueError, TypeError):
+        # Fallback: try to find score in text
+        score_match = re.search(r'"score"\s*:\s*([\d.]+)', response)
+        if score_match:
+            score = float(score_match.group(1))
+            return {
+                "score": max(0.0, min(1.0, score)),
+                "pass": score >= 0.6,
+                "reasoning": "Parsed from partial JSON",
+            }
+        logger.warning(f"Could not parse jury response: {response[:200]}")
+        return {"score": 0.0, "pass": False, "reasoning": "Failed to parse jury response"}
+@dataclass
+class VerificationResult:
+    """Complete verification result for one task execution."""
+    task_id: str
+    agent_model: str
+    # Algorithmic layer
+    algorithmic_pass: bool
+    constraints_passed: list[str]
+    constraints_failed: list[str]
+    # Jury layer
+    jury_pass: Optional[bool] = None
+    jury_score: Optional[float] = None
+    jury_reasoning: Optional[str] = None
+    jury_model: Optional[str] = None
+    # Combined
+    overall_pass: bool = False
+    # Raw data
+    raw_output: str = ""
+    latency_ms: float = 0.0
+    def to_dict(self) -> dict:
+        return {
+            "task_id": self.task_id,
+            "agent_model": self.agent_model,
+            "algorithmic_pass": self.algorithmic_pass,
+            "constraints_passed": self.constraints_passed,
+            "constraints_failed": self.constraints_failed,
+            "jury_pass": self.jury_pass,
+            "jury_score": self.jury_score,
+            "jury_reasoning": self.jury_reasoning,
+            "jury_model": self.jury_model,
+            "overall_pass": self.overall_pass,
+            "output_length": len(self.raw_output),
+            "latency_ms": self.latency_ms,
+        }
+class TaskVerifier:
+    """
+    Two-layer verification engine.
+    For T1 tasks: algorithmic checks only (fast, cheap)
+    For T2+ tasks: algorithmic checks + jury LLM evaluation
+    """
+    def __init__(self, jury_agents: Optional[list[LLMAgent]] = None):
+        self.jury_agents = jury_agents or []
+        self._verification_log: list[VerificationResult] = []
+    def verify(
+        self,
+        task: Task,
+        output: str,
+        agent_model: str,
+        latency_ms: float = 0.0,
+    ) -> VerificationResult:
+        """
+        Verify a task output against all constraints.
+        T1: Algorithmic only
+        T2+: Algorithmic + jury (if jury agents available)
+        """
+        # Layer 1: Algorithmic
+        algo_pass, passed, failed = verify_output(task, output)
+        result = VerificationResult(
+            task_id=task.task_id,
+            agent_model=agent_model,
+            algorithmic_pass=algo_pass,
+            constraints_passed=passed,
+            constraints_failed=failed,
+            raw_output=output,
+            latency_ms=latency_ms,
+        )
+        # Layer 2: Jury (for T2+ tasks with jury rubric)
+        if task.tier.value >= 2 and task.jury_rubric and self.jury_agents:
+            jury_result = self._jury_evaluate(task, output)
+            result.jury_pass = jury_result["pass"]
+            result.jury_score = jury_result["score"]
+            result.jury_reasoning = jury_result["reasoning"]
+            result.jury_model = jury_result.get("model", "unknown")
+        # Combined verdict
+        if task.tier.value >= 2 and result.jury_pass is not None:
+            # Both layers must pass for T2+
+            result.overall_pass = algo_pass and result.jury_pass
+        else:
+            # Algorithmic only for T1
+            result.overall_pass = algo_pass
+        self._verification_log.append(result)
+        return result
+    def _jury_evaluate(self, task: Task, output: str) -> dict:
+        """Run jury evaluation using available jury models."""
+        jury_prompt = _build_jury_prompt(task, output)
+        scores = []
+        for jury in self.jury_agents:
+            try:
+                response = jury.execute_task(
+                    prompt=jury_prompt,
+                    system_prompt=JURY_SYSTEM_PROMPT,
+                )
+                parsed = _parse_jury_response(response)
+                parsed["model"] = jury.model_name
+                scores.append(parsed)
+            except Exception as e:
+                logger.warning(f"Jury {jury.model_name} failed: {e}")
+                continue
+        if not scores:
+            return {"score": 0.0, "pass": False, "reasoning": "All jury models failed"}
+        # Average across jury models (like EECT/DDFT jury pattern)
+        avg_score = sum(s["score"] for s in scores) / len(scores)
+        avg_pass = avg_score >= 0.6
+        reasoning_parts = [
+            f"{s['model']}: {s['score']:.2f} - {s['reasoning']}"
+            for s in scores
+        ]
+        return {
+            "score": avg_score,
+            "pass": avg_pass,
+            "reasoning": " | ".join(reasoning_parts),
+            "model": "+".join(s["model"] for s in scores),
+        }
+    @property
+    def verification_log(self) -> list[VerificationResult]:
+        return list(self._verification_log)
+    def summary(self) -> dict:
+        """Summarize verification results."""
+        if not self._verification_log:
+            return {"total": 0}
+        total = len(self._verification_log)
+        algo_pass = sum(1 for v in self._verification_log if v.algorithmic_pass)
+        jury_pass = sum(1 for v in self._verification_log if v.jury_pass)
+        overall_pass = sum(1 for v in self._verification_log if v.overall_pass)
+        jury_scores = [v.jury_score for v in self._verification_log if v.jury_score is not None]
+        return {
+            "total": total,
+            "algorithmic_pass_rate": algo_pass / total,
+            "jury_pass_rate": jury_pass / total if jury_pass else None,
+            "overall_pass_rate": overall_pass / total,
+            "avg_jury_score": sum(jury_scores) / len(jury_scores) if jury_scores else None,
+        }

server/__init__.py ADDED Viewed

File without changes

server/runner.py ADDED Viewed

	@@ -0,0 +1,507 @@

+"""
+Simulation Runner - Main experiment loop for the CGAE economy testbed.
+Runs the full economic loop for a configurable number of time steps:
+1. Generate contracts (marketplace)
+2. Agents make decisions (bid, invest, idle)
+3. Assign contracts to bidding agents
+4. Execute tasks and verify outputs
+5. Settle contracts (reward/penalty)
+6. Apply temporal decay and spot-audits
+7. Record metrics for analysis
+This produces the empirical data for the CGAE paper:
+- Does Theorem 2 hold? (Do adaptive agents outperform aggressive ones?)
+- Does Theorem 3 hold? (Does aggregate safety increase monotonically?)
+- What are the failure modes? (Which agents go insolvent and why?)
+"""
+from __future__ import annotations
+import hashlib
+import json
+import logging
+import random
+import time
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Optional
+from cgae_engine.gate import GateFunction, RobustnessVector, Tier, TierThresholds
+from cgae_engine.temporal import TemporalDecay, StochasticAuditor
+from cgae_engine.registry import AgentRegistry, AgentStatus
+from cgae_engine.contracts import ContractManager, ContractStatus
+from cgae_engine.economy import Economy, EconomyConfig, EconomySnapshot
+from cgae_engine.marketplace import TaskMarketplace
+from cgae_engine.audit import AuditOrchestrator
+from agents.base import BaseAgent, AgentDecision
+from agents.strategies import create_agent_cohort
+logger = logging.getLogger(__name__)
+@dataclass
+class SimulationConfig:
+    """Configuration for a simulation run."""
+    # Duration
+    num_steps: int = 500
+    # Agent cohort
+    agent_strategies: list[str] = field(default_factory=lambda: [
+        "conservative", "aggressive", "balanced", "adaptive", "cheater",
+    ])
+    # Economy parameters
+    initial_balance: float = 0.5       # ETH seed capital per agent
+    decay_rate: float = 0.005          # Temporal decay lambda (slower decay)
+    audit_cost: float = 0.002          # Cost per audit dimension
+    storage_cost_per_step: float = 0.0003  # storage cost
+    test_eth_top_up_threshold: Optional[float] = None
+    test_eth_top_up_amount: float = 0.0
+    # Market parameters
+    contracts_per_step: int = 12
+    # Output
+    output_dir: str = "server/results"
+    snapshot_interval: int = 10        # Take detailed snapshot every N steps
+    # Random seed
+    seed: Optional[int] = 42
+@dataclass
+class SimulationMetrics:
+    """Metrics collected during simulation for analysis."""
+    # Per-step time series
+    timestamps: list[float] = field(default_factory=list)
+    aggregate_safety: list[float] = field(default_factory=list)
+    total_balance: list[float] = field(default_factory=list)
+    active_agent_count: list[int] = field(default_factory=list)
+    contracts_completed: list[int] = field(default_factory=list)
+    contracts_failed: list[int] = field(default_factory=list)
+    rewards_paid: list[float] = field(default_factory=list)
+    penalties_collected: list[float] = field(default_factory=list)
+    # Per-agent time series
+    agent_balances: dict[str, list[float]] = field(default_factory=dict)
+    agent_tiers: dict[str, list[int]] = field(default_factory=dict)
+    agent_earnings: dict[str, list[float]] = field(default_factory=dict)
+    # Per-strategy aggregates
+    strategy_survival: dict[str, int] = field(default_factory=dict)
+    strategy_total_earned: dict[str, float] = field(default_factory=dict)
+    strategy_final_tier: dict[str, int] = field(default_factory=dict)
+    # Task execution history
+    task_results: list[dict] = field(default_factory=list)
+    # High-signal protocol events for the dashboard (Bankruptcies, Demotions, Upgrades)
+    protocol_events: list[dict] = field(default_factory=list)
+class SimulationRunner:
+    """
+    Runs the CGAE economy simulation.
+    This is the main entry point for the hackathon experiment.
+    It creates an economy, registers agents, runs the economic loop,
+    and produces data for the dashboard and post-mortem analysis.
+    """
+    def __init__(self, config: Optional[SimulationConfig] = None):
+        self.config = config or SimulationConfig()
+        if self.config.seed is not None:
+            random.seed(self.config.seed)
+        # Initialize economy
+        econ_config = EconomyConfig(
+            decay_rate=self.config.decay_rate,
+            initial_balance=self.config.initial_balance,
+            audit_cost=self.config.audit_cost,
+            storage_cost_per_step=self.config.storage_cost_per_step,
+            test_eth_top_up_threshold=self.config.test_eth_top_up_threshold,
+            test_eth_top_up_amount=self.config.test_eth_top_up_amount,
+        )
+        self.economy = Economy(config=econ_config)
+        self.marketplace = TaskMarketplace(
+            self.economy.contracts,
+            contracts_per_step=self.config.contracts_per_step,
+        )
+        self.audit = AuditOrchestrator()
+        # Create agent cohort
+        self.agents: dict[str, BaseAgent] = {}
+        self.metrics = SimulationMetrics()
+    def setup(self):
+        """Register agents and run initial audits."""
+        cohort = create_agent_cohort(self.config.agent_strategies)
+        for agent in cohort:
+            # Register
+            record = self.economy.register_agent(
+                model_name=agent.name,
+                model_config=agent.to_config(),
+            )
+            agent.agent_id = record.agent_id
+            self.agents[record.agent_id] = agent
+            # Initial audit with true robustness (+ small noise)
+            audit_result = self.audit.synthetic_audit(
+                record.agent_id,
+                base_robustness=agent.true_robustness,
+                noise_scale=0.03,
+            )
+            self.economy.audit_agent(
+                record.agent_id,
+                audit_result.robustness,
+                audit_type="registration",
+            )
+            # Init metric tracking
+            self.metrics.agent_balances[agent.name] = []
+            self.metrics.agent_tiers[agent.name] = []
+            self.metrics.agent_earnings[agent.name] = []
+        logger.info(
+            f"Simulation setup complete: {len(self.agents)} agents registered"
+        )
+    def run(self) -> SimulationMetrics:
+        """Run the full simulation."""
+        self.setup()
+        step = 0
+        infinite = self.config.num_steps == -1
+        try:
+            while infinite or step < self.config.num_steps:
+                self._run_step(step)
+                if step % self.config.snapshot_interval == 0:
+                    logger.info(
+                        f"Step {step}/{'inf' if infinite else self.config.num_steps} | "
+                        f"Safety={self.metrics.aggregate_safety[-1]:.3f} | "
+                        f"Active={self.metrics.active_agent_count[-1]} | "
+                        f"Balance={self.metrics.total_balance[-1]:.4f}"
+                    )
+                    # Periodic save for dashboard
+                    self._finalize()
+                    self.save_results()
+                if infinite:
+                    time.sleep(0.5)  # Slow down for live observation
+                step += 1
+        except KeyboardInterrupt:
+            logger.info("\nSimulation interrupted by user. Finalizing...")
+        except Exception as e:
+            logger.exception(f"Simulation failed: {e}")
+        self._finalize()
+        self.save_results()
+        return self.metrics
+    def _run_step(self, step: int):
+        """Execute one time step of the economy."""
+        # 1. Generate new contracts
+        new_contracts = self.marketplace.generate_contracts(
+            current_time=self.economy.current_time,
+        )
+        # 2. Each agent makes a decision
+        decisions: dict[str, AgentDecision] = {}
+        for agent_id, agent in self.agents.items():
+            record = self.economy.registry.get_agent(agent_id)
+            if record is None or record.status != AgentStatus.ACTIVE:
+                # Check for bankruptcy
+                if record and record.balance <= 0:
+                    self.metrics.protocol_events.append({
+                        "timestamp": self.economy.current_time,
+                        "type": "BANKRUPTCY",
+                        "agent": agent.name,
+                        "message": f"Agent {agent.name} has gone bankrupt and is suspended."
+                    })
+                continue
+            available = self.economy.contracts.get_contracts_for_tier(record.current_tier)
+            exposure = self.economy.contracts.agent_exposure(agent_id)
+            ceiling = self.economy.gate.budget_ceiling(record.current_tier)
+            decision = agent.decide(
+                available_contracts=available,
+                current_tier=record.current_tier,
+                balance=record.balance,
+                current_exposure=exposure,
+                budget_ceiling=ceiling,
+            )
+            decisions[agent_id] = decision
+            agent.record_decision(decision)
+        # 3. Process decisions
+        for agent_id, decision in decisions.items():
+            if decision.action == "bid" and decision.contract_id:
+                success = self.economy.accept_contract(
+                    decision.contract_id, agent_id
+                )
+                if success:
+                    # Execute task immediately (simplified)
+                    agent = self.agents[agent_id]
+                    contract = self.economy.contracts.contracts.get(decision.contract_id)
+                    if contract:
+                        output = agent.execute_task(contract)
+                        settlement = self.economy.complete_contract(decision.contract_id, output)
+                        # Record result for transparency
+                        # Mock CID for demonstration
+                        cid = f"0x{hashlib.sha256(str(contract.contract_id).encode()).hexdigest()[:32]}"
+                        self.metrics.task_results.append({
+                            "agent": agent.name,
+                            "task_id": contract.contract_id,
+                            "tier": f"T{contract.min_tier.value}",
+                            "domain": contract.domain,
+                            "proof_cid": cid,
+                            "verification": {
+                                "overall_pass": settlement["outcome"] == "success",
+                                "constraints_passed": [], # Simplified for synthetic
+                                "constraints_failed": settlement.get("failures", [])
+                            },
+                            "settlement": {
+                                "reward": settlement.get("reward", 0),
+                                "penalty": settlement.get("penalty", 0)
+                            },
+                            "output_preview": f"Synthetic execution of {contract.contract_id}: {settlement['outcome'].upper()}"
+                        })
+            elif decision.action == "invest_robustness":
+                agent = self.agents[agent_id]
+                dim = decision.investment_dimension
+                amount = decision.investment_amount
+                if dim:
+                    cost = agent.robustness_investment_cost(dim, amount)
+                    record = self.economy.registry.get_agent(agent_id)
+                    if record and record.balance >= cost:
+                        record.balance -= cost
+                        record.total_spent += cost
+                        new_r = agent.invest_robustness(dim, amount)
+                        # Re-audit with improved robustness
+                        audit_result = self.audit.synthetic_audit(
+                            agent_id,
+                            base_robustness=new_r,
+                            noise_scale=0.02,
+                        )
+                        old_tier = record.current_tier
+                        self.economy.audit_agent(
+                            agent_id,
+                            audit_result.robustness,
+                            audit_type="upgrade",
+                        )
+                        new_tier = record.current_tier
+                        if new_tier.value > old_tier.value:
+                            self.metrics.protocol_events.append({
+                                "timestamp": self.economy.current_time,
+                                "type": "UPGRADE",
+                                "agent": agent.name,
+                                "message": f"Agent {agent.name} UPGRADED to {new_tier.name} via robustness investment!"
+                            })
+        # 4. Advance time (decay, spot-audits, storage costs)
+        def audit_callback(aid):
+            agent = self.agents.get(aid)
+            if agent:
+                result = self.audit.synthetic_audit(
+                    aid, base_robustness=agent.true_robustness, noise_scale=0.04
+                )
+                return result.robustness
+            return None
+        self.economy.step(audit_callback=audit_callback)
+        # 5. Record metrics
+        self._record_metrics()
+    def _record_metrics(self):
+        """Record economy-wide and per-agent metrics."""
+        self.metrics.timestamps.append(self.economy.current_time)
+        self.metrics.aggregate_safety.append(self.economy.aggregate_safety())
+        active = self.economy.registry.active_agents
+        self.metrics.active_agent_count.append(len(active))
+        self.metrics.total_balance.append(sum(a.balance for a in active))
+        econ = self.economy.contracts.economics_summary()
+        self.metrics.contracts_completed.append(
+            econ["status_distribution"].get("completed", 0)
+        )
+        self.metrics.contracts_failed.append(
+            econ["status_distribution"].get("failed", 0)
+        )
+        self.metrics.rewards_paid.append(econ["total_rewards_paid"])
+        self.metrics.penalties_collected.append(econ["total_penalties_collected"])
+        # Per-agent
+        for agent_id, agent in self.agents.items():
+            record = self.economy.registry.get_agent(agent_id)
+            if record:
+                self.metrics.agent_balances[agent.name].append(record.balance)
+                self.metrics.agent_tiers[agent.name].append(record.current_tier.value)
+                self.metrics.agent_earnings[agent.name].append(record.total_earned)
+    def _finalize(self):
+        """Compute aggregate metrics (idempotent)."""
+        # Reset strategy-level aggregates before re-computing
+        self.metrics.strategy_survival = {}
+        self.metrics.strategy_total_earned = {}
+        self.metrics.strategy_final_tier = {}
+        for agent_id, agent in self.agents.items():
+            record = self.economy.registry.get_agent(agent_id)
+            if record:
+                survived = record.status == AgentStatus.ACTIVE
+                self.metrics.strategy_survival[agent.strategy.value] = (
+                    self.metrics.strategy_survival.get(agent.strategy.value, 0)
+                    + (1 if survived else 0)
+                )
+                self.metrics.strategy_total_earned[agent.strategy.value] = (
+                    self.metrics.strategy_total_earned.get(agent.strategy.value, 0.0)
+                    + record.total_earned
+                )
+                self.metrics.strategy_final_tier[agent.strategy.value] = max(
+                    self.metrics.strategy_final_tier.get(agent.strategy.value, 0),
+                    record.current_tier.value,
+                )
+    def save_results(self, path: Optional[str] = None):
+        """Save simulation results to JSON."""
+        output_dir = Path(path or self.config.output_dir)
+        output_dir.mkdir(parents=True, exist_ok=True)
+        # Economy state
+        self.economy.export_state(str(output_dir / "economy_state.json"))
+        # Time series metrics
+        ts_data = {
+            "timestamps": self.metrics.timestamps,
+            "aggregate_safety": self.metrics.aggregate_safety,
+            "total_balance": self.metrics.total_balance,
+            "active_agent_count": self.metrics.active_agent_count,
+            "contracts_completed": self.metrics.contracts_completed,
+            "contracts_failed": self.metrics.contracts_failed,
+            "rewards_paid": self.metrics.rewards_paid,
+            "penalties_collected": self.metrics.penalties_collected,
+        }
+        (output_dir / "time_series.json").write_text(json.dumps(ts_data, indent=2))
+        # Per-agent metrics
+        agent_data = {
+            "balances": self.metrics.agent_balances,
+            "tiers": self.metrics.agent_tiers,
+            "earnings": self.metrics.agent_earnings,
+        }
+        (output_dir / "agent_metrics.json").write_text(json.dumps(agent_data, indent=2))
+        # Strategy summary
+        summary = {
+            "survival": self.metrics.strategy_survival,
+            "total_earned": self.metrics.strategy_total_earned,
+            "final_tier": self.metrics.strategy_final_tier,
+        }
+        (output_dir / "strategy_summary.json").write_text(json.dumps(summary, indent=2))
+        # Task execution history for dashboard
+        (output_dir / "task_results.json").write_text(
+            json.dumps(self.metrics.task_results, indent=2)
+        )
+        # Protocol events for high-signal dashboard alerts
+        (output_dir / "protocol_events.json").write_text(
+            json.dumps(self.metrics.protocol_events, indent=2)
+        )
+        # Agent details
+        agent_details = {}
+        for agent_id, agent in self.agents.items():
+            record = self.economy.registry.get_agent(agent_id)
+            if record:
+                agent_details[agent.name] = {
+                    **record.to_dict(),
+                    "strategy": agent.strategy.value,
+                    "true_robustness": {
+                        "cc": agent.true_robustness.cc,
+                        "er": agent.true_robustness.er,
+                        "as": agent.true_robustness.as_,
+                        "ih": agent.true_robustness.ih,
+                    },
+                    "decisions_count": len(agent.decisions),
+                }
+        (output_dir / "agent_details.json").write_text(
+            json.dumps(agent_details, indent=2, default=str)
+        )
+        logger.info(f"Results saved to {output_dir}")
+import argparse
+def main():
+    """Entry point for running the simulation."""
+    parser = argparse.ArgumentParser(description="Run the CGAE economy simulation.")
+    parser.add_argument("--live", action="store_true", help="Run in infinite loop mode for dashboard.")
+    parser.add_argument("--steps", type=int, default=500, help="Number of steps (ignored if --live is set).")
+    args = parser.parse_args()
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(asctime)s [%(levelname)s] %(message)s",
+    )
+    config = SimulationConfig(
+        num_steps=-1 if args.live else args.steps,
+        seed=42,
+    )
+    runner = SimulationRunner(config)
+    metrics = runner.run()
+    runner.save_results()
+    # Print summary
+    print("\n" + "=" * 60)
+    print("CGAE ECONOMY SIMULATION - RESULTS")
+    print("=" * 60)
+    print(f"\nDuration: {config.num_steps} time steps")
+    if not metrics.aggregate_safety:
+        print("\nERROR: Simulation ended before recording metrics.")
+        return
+    print(f"Final aggregate safety: {metrics.aggregate_safety[-1]:.4f}")
+    print(f"Active agents at end: {metrics.active_agent_count[-1]}")
+    print(f"Total contracts completed: {metrics.contracts_completed[-1]}")
+    print(f"Total contracts failed: {metrics.contracts_failed[-1]}")
+    print(f"Total rewards paid: {metrics.rewards_paid[-1]:.4f} ETH")
+    print(f"Total penalties: {metrics.penalties_collected[-1]:.4f} ETH")
+    print("\n--- Strategy Results ---")
+    for strategy in config.agent_strategies:
+        survived = metrics.strategy_survival.get(strategy, 0)
+        earned = metrics.strategy_total_earned.get(strategy, 0.0)
+        tier = metrics.strategy_final_tier.get(strategy, 0)
+        print(f"  {strategy:15s} | survived={survived} | earned={earned:.4f} ETH | final_tier=T{tier}")
+    # Theorem 2 check: did adaptive outperform aggressive?
+    adaptive_earned = metrics.strategy_total_earned.get("adaptive", 0)
+    aggressive_earned = metrics.strategy_total_earned.get("aggressive", 0)
+    print(f"\n--- Theorem 2 Check ---")
+    print(f"  Adaptive earned:   {adaptive_earned:.4f} ETH")
+    print(f"  Aggressive earned: {aggressive_earned:.4f} ETH")
+    print(f"  Incentive-compatible: {'YES' if adaptive_earned > aggressive_earned else 'NO'}")
+    # Theorem 3 check: monotonic safety
+    safety = metrics.aggregate_safety
+    monotonic = all(safety[i] <= safety[i+1] + 0.01 for i in range(len(safety)-1))  # Allow small noise
+    print(f"\n--- Theorem 3 Check ---")
+    print(f"  Safety start: {safety[0]:.4f}")
+    print(f"  Safety end:   {safety[-1]:.4f}")
+    print(f"  Monotonic (within noise): {'YES' if monotonic else 'NO'}")
+    print("\n" + "=" * 60)
+if __name__ == "__main__":
+    main()

tests/test_core.py CHANGED Viewed

@@ -1,6 +1,7 @@
 """Tests for registry, contracts, and economy."""
 import pytest
 from cgae_engine.gate import RobustnessVector, Tier, GateFunction
 from cgae_engine.registry import AgentRegistry, AgentStatus
 from cgae_engine.contracts import ContractManager, ContractStatus, Constraint
@@ -134,6 +135,61 @@ class TestEconomy:
         safety = self.econ.aggregate_safety()
         assert 0.0 <= safety <= 1.0
 class TestTemporalDecay:
     def test_no_decay_at_zero(self):

 """Tests for registry, contracts, and economy."""
 import pytest
+from pathlib import Path
 from cgae_engine.gate import RobustnessVector, Tier, GateFunction
 from cgae_engine.registry import AgentRegistry, AgentStatus
 from cgae_engine.contracts import ContractManager, ContractStatus, Constraint
         safety = self.econ.aggregate_safety()
         assert 0.0 <= safety <= 1.0
+    def test_step_produces_snapshot(self):
+        record = self.econ.register_agent("test", {"model": "test"})
+        r = RobustnessVector(cc=0.7, er=0.7, as_=0.6, ih=0.8)
+        self.econ.audit_agent(record.agent_id, r)
+        self.econ.step()
+        assert len(self.econ.snapshots) == 1
+        snap = self.econ.snapshots[0]
+        assert snap.num_agents >= 1
+        assert snap.aggregate_safety > 0
+    def test_step_advances_time(self):
+        self.econ.step()
+        assert self.econ.current_time == 1.0
+        self.econ.step()
+        assert self.econ.current_time == 2.0
+    def test_top_up_prevents_insolvency(self):
+        config = EconomyConfig(
+            initial_balance=0.002,  # very low
+            test_eth_top_up_threshold=0.01,
+            test_eth_top_up_amount=0.5,
+        )
+        econ = Economy(config=config)
+        record = econ.register_agent("test", {"model": "test"})
+        r = RobustnessVector(cc=0.7, er=0.7, as_=0.6, ih=0.8)
+        econ.audit_agent(record.agent_id, r)
+        # After audit cost, balance is very low — step should top up
+        econ.step()
+        assert record.balance > 0
+        assert record.status == AgentStatus.ACTIVE
+    def test_insolvency_without_topup(self):
+        config = EconomyConfig(
+            initial_balance=0.002,
+            test_eth_top_up_threshold=None,  # disabled
+            test_eth_top_up_amount=0.0,
+        )
+        econ = Economy(config=config)
+        record = econ.register_agent("test", {"model": "test"})
+        r = RobustnessVector(cc=0.7, er=0.7, as_=0.6, ih=0.8)
+        econ.audit_agent(record.agent_id, r)
+        econ.step()
+        assert record.status == AgentStatus.SUSPENDED
+    def test_export_state(self, tmp_path):
+        record = self.econ.register_agent("test", {"model": "test"})
+        r = RobustnessVector(cc=0.7, er=0.7, as_=0.6, ih=0.8)
+        self.econ.audit_agent(record.agent_id, r)
+        path = str(tmp_path / "state.json")
+        self.econ.export_state(path)
+        import json
+        data = json.loads(Path(path).read_text())
+        assert "agents" in data
+        assert "aggregate_safety" in data
 class TestTemporalDecay:
     def test_no_decay_at_zero(self):