# observability-and-dashboard ## overview Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards. ## dashboard-sections ### 1-live-thought-stream - chronological reasoning notes - model/router choice trace - action confidence timeline - override events ### 2-navigation-map Graph of visited pages: - nodes = URLs - edges = transitions - node color = relevance/confidence - revisit highlighting ### 3-mcp-usage-panel - tool call count by server - avg latency by tool - error rate and retries - top successful tool chains ### 4-memory-viewer - inspect short/working/long/shared memory - filter by task/domain/confidence - edit/delete entries - prune previews ### 5-reward-analytics - per-step reward breakdown - component contribution trends - penalty heatmap - episode comparison ### 6-cost-and-token-monitor - per-provider usage - per-model token counts - cumulative cost vs budget - forecasted burn rate ## core-metrics ### agent-metrics - task completion rate - avg steps to completion - recovery score - generalization score - exploration ratio ### tool-metrics - tool success rate - timeout ratio - fallback frequency - schema validation failures ### memory-metrics - retrieval hit rate - relevance score distribution - prune rate - memory-assisted success ratio ### search-metrics - query success rate - multi-hop depth distribution - credibility score average - duplicate result ratio ## logging-model Structured logs (JSON): ```json { "timestamp": "2026-03-27T00:00:00Z", "episode_id": "ep_123", "step": 7, "event": "tool_call", "tool": "beautifulsoup.find_all", "latency_ms": 54, "success": true, "reward_delta": 0.08 } ``` ## tracing Per-episode trace includes: - observations - actions - rewards - tool calls - memory operations - final submission and grader results ## alerts Configurable alerts: - budget threshold crossed - error spike - tool outage - memory bloat - anomalous low reward streak ## apis - `GET /api/metrics/summary` - `GET /api/metrics/timeseries` - `GET /api/traces/{episode_id}` - `GET /api/costs` - `GET /api/memory/stats` - `GET /api/tools/stats` ## recommended-dashboard-layout 1. Top row: completion, cost, latency, error rate 2. Mid row: thought stream + navigation graph 3. Lower row: reward breakdown + MCP usage + memory viewer 4. Bottom row: raw trace and export controls ## export-and-audit Exports: - JSON trace - CSV metrics - reward analysis report - model usage report All exports include episode and configuration fingerprints for reproducibility. ## related-api-reference | item | value | | --- | --- | | api-reference | `api-reference.md` | ## document-metadata | key | value | | --- | --- | | document | `observability.md` | | status | active | ## document-flow ```mermaid flowchart TD A[document] --> B[key-sections] B --> C[implementation] B --> D[operations] B --> E[validation] ```