Spaces:
Running
Running
observability-and-dashboard
overview
Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.
dashboard-sections
1-live-thought-stream
- chronological reasoning notes
- model/router choice trace
- action confidence timeline
- override events
2-navigation-map
Graph of visited pages:
- nodes = URLs
- edges = transitions
- node color = relevance/confidence
- revisit highlighting
3-mcp-usage-panel
- tool call count by server
- avg latency by tool
- error rate and retries
- top successful tool chains
4-memory-viewer
- inspect short/working/long/shared memory
- filter by task/domain/confidence
- edit/delete entries
- prune previews
5-reward-analytics
- per-step reward breakdown
- component contribution trends
- penalty heatmap
- episode comparison
6-cost-and-token-monitor
- per-provider usage
- per-model token counts
- cumulative cost vs budget
- forecasted burn rate
core-metrics
agent-metrics
- task completion rate
- avg steps to completion
- recovery score
- generalization score
- exploration ratio
tool-metrics
- tool success rate
- timeout ratio
- fallback frequency
- schema validation failures
memory-metrics
- retrieval hit rate
- relevance score distribution
- prune rate
- memory-assisted success ratio
search-metrics
- query success rate
- multi-hop depth distribution
- credibility score average
- duplicate result ratio
logging-model
Structured logs (JSON):
{
"timestamp": "2026-03-27T00:00:00Z",
"episode_id": "ep_123",
"step": 7,
"event": "tool_call",
"tool": "beautifulsoup.find_all",
"latency_ms": 54,
"success": true,
"reward_delta": 0.08
}
tracing
Per-episode trace includes:
- observations
- actions
- rewards
- tool calls
- memory operations
- final submission and grader results
alerts
Configurable alerts:
- budget threshold crossed
- error spike
- tool outage
- memory bloat
- anomalous low reward streak
apis
GET /api/metrics/summaryGET /api/metrics/timeseriesGET /api/traces/{episode_id}GET /api/costsGET /api/memory/statsGET /api/tools/stats
recommended-dashboard-layout
- Top row: completion, cost, latency, error rate
- Mid row: thought stream + navigation graph
- Lower row: reward breakdown + MCP usage + memory viewer
- Bottom row: raw trace and export controls
export-and-audit
Exports:
- JSON trace
- CSV metrics
- reward analysis report
- model usage report
All exports include episode and configuration fingerprints for reproducibility.
related-api-reference
| item | value |
|---|---|
| api-reference | api-reference.md |
document-metadata
| key | value |
|---|---|
| document | observability.md |
| status | active |
document-flow
flowchart TD
A[document] --> B[key-sections]
B --> C[implementation]
B --> D[operations]
B --> E[validation]