PhD Research OS β Architecture Map
WAKE-UP INSTRUCTION: This file is the ground truth for all file locations, API configurations, and system topology. Read this FIRST before touching anything.
System Topology
phd-research-os-brain/
βββ ARCHITECTURE.md β YOU ARE HERE (project map)
βββ AGENTS.md β Agent role registry & contracts
βββ MEMORY.md β Persistent cross-session state
βββ plan.md β Current task plan (mutable)
βββ HARNESS_EVOLUTION.md β ECC rule amendments log
β
βββ train.py β SFT training script (Qwen2.5-3B + QLoRA)
βββ generate_dataset.py β Synthetic dataset generator (1900 examples)
β
βββ phd_research_os/ β CORE PACKAGE
β βββ __init__.py β v1.0.0
β βββ db.py β SQLite data layer (Phase 0)
β β Tables: claims, sources, goals, conflicts,
β β decisions, overrides, experiments,
β β api_usage_log, calibration_log, embedding_cache
β β + companion_agents, agent_tasks, agent_audit_log
β βββ agents.py β Original 6-role AI brain (ResearchOSBrain)
β βββ agent_os.py β ECC HARNESS ORCHESTRATOR (companion AI factory)
β β CompanionAgent lifecycle: spawn β preflight β
β β plan β execute β postflight β retire
β βββ pipeline.py β Paper ingestion (PDF β claims)
β βββ obsidian_export.py β Obsidian vault export (one-directional)
β βββ evaluation.py β Golden dataset eval harness + regression gate
β βββ conflict_detector.py β Pairwise contradiction detection
β βββ backup.py β SQLite backup & restore
β
βββ tests/
β βββ test_db.py β 22 unit tests (data layer)
β βββ test_agent_os.py β ECC harness integration tests
β
βββ output/
βββ research_os/
βββ db.py β Symlink/alias β phd_research_os/db.py
API Configuration
| Provider | Env Variable | Default Model | Use Case |
|---|---|---|---|
| Anthropic | ANTHROPIC_API_KEY |
claude-sonnet-4-20250514 | Primary brain + companion agents |
| OpenAI | OPENAI_API_KEY |
gpt-4o-mini | Fallback |
| OpenRouter | OPENROUTER_API_KEY |
(configurable) | Multi-model companion agents |
| HF Local | (model path) | nkshirsa/phd-research-os-brain | Fine-tuned local inference |
Database Schema (db.py)
Core Tables (Research OS)
claimsβ Scientific claims with fixed-point confidence (Γ1000)sourcesβ Paper metadata (DOI, journal tier, study type)goalsβ Research goals with priority orderingconflictsβ Claim contradiction pairs (hypothesis_confidence ALWAYS "low")decisionsβ Proposed research actions with info gain scoresoverridesβ Expert confidence overrides (lock mechanism)experimentsβ Lab data objects (manual approval required)api_usage_logβ Cost tracking per API callcalibration_logβ Brier score data collectionembedding_cacheβ Semantic cache (text_hash β embedding)
ECC Harness Tables (agent_os.py)
companion_agentsβ Registry of spawned companion AIsagent_tasksβ Task lifecycle tracking (preflight β done)agent_audit_logβ Every action, decision, and deviation logged
Key Invariants (NEVER violate)
- Fixed-Point Math: All probabilities stored as INTEGER Γ 1000. No floats in DB.
- Provenance: All AI output is Level 5 (LLM Hypothesis). Human required for promotion.
- Hypothesis Confidence: Conflict hypotheses are ALWAYS "low". Never auto-promote.
- Expert Override: Once set, system cannot overwrite. Only human can change.
- Schema Version: Every record carries
schema_versiontag. - Companion Agent Isolation: Companions cannot modify claims directly β they propose, humans approve.