# PhD Research OS — Architecture Map > **WAKE-UP INSTRUCTION**: This file is the ground truth for all file locations, > API configurations, and system topology. Read this FIRST before touching anything. ## System Topology ``` phd-research-os-brain/ ├── ARCHITECTURE.md ← YOU ARE HERE (project map) ├── AGENTS.md ← Agent role registry & contracts ├── MEMORY.md ← Persistent cross-session state ├── plan.md ← Current task plan (mutable) ├── HARNESS_EVOLUTION.md ← ECC rule amendments log │ ├── train.py ← SFT training script (Qwen2.5-3B + QLoRA) ├── generate_dataset.py ← Synthetic dataset generator (1900 examples) │ ├── phd_research_os/ ← CORE PACKAGE │ ├── __init__.py ← v1.0.0 │ ├── db.py ← SQLite data layer (Phase 0) │ │ Tables: claims, sources, goals, conflicts, │ │ decisions, overrides, experiments, │ │ api_usage_log, calibration_log, embedding_cache │ │ + companion_agents, agent_tasks, agent_audit_log │ ├── agents.py ← Original 6-role AI brain (ResearchOSBrain) │ ├── agent_os.py ← ECC HARNESS ORCHESTRATOR (companion AI factory) │ │ CompanionAgent lifecycle: spawn → preflight → │ │ plan → execute → postflight → retire │ ├── pipeline.py ← Paper ingestion (PDF → claims) │ ├── obsidian_export.py ← Obsidian vault export (one-directional) │ ├── evaluation.py ← Golden dataset eval harness + regression gate │ ├── conflict_detector.py ← Pairwise contradiction detection │ └── backup.py ← SQLite backup & restore │ ├── tests/ │ ├── test_db.py ← 22 unit tests (data layer) │ └── test_agent_os.py ← ECC harness integration tests │ └── output/ └── research_os/ └── db.py ← Symlink/alias → phd_research_os/db.py ``` ## API Configuration | Provider | Env Variable | Default Model | Use Case | |----------|-------------|---------------|----------| | Anthropic | `ANTHROPIC_API_KEY` | claude-sonnet-4-20250514 | Primary brain + companion agents | | OpenAI | `OPENAI_API_KEY` | gpt-4o-mini | Fallback | | OpenRouter | `OPENROUTER_API_KEY` | (configurable) | Multi-model companion agents | | HF Local | (model path) | nkshirsa/phd-research-os-brain | Fine-tuned local inference | ## Database Schema (db.py) ### Core Tables (Research OS) - `claims` — Scientific claims with fixed-point confidence (×1000) - `sources` — Paper metadata (DOI, journal tier, study type) - `goals` — Research goals with priority ordering - `conflicts` — Claim contradiction pairs (hypothesis_confidence ALWAYS "low") - `decisions` — Proposed research actions with info gain scores - `overrides` — Expert confidence overrides (lock mechanism) - `experiments` — Lab data objects (manual approval required) - `api_usage_log` — Cost tracking per API call - `calibration_log` — Brier score data collection - `embedding_cache` — Semantic cache (text_hash → embedding) ### ECC Harness Tables (agent_os.py) - `companion_agents` — Registry of spawned companion AIs - `agent_tasks` — Task lifecycle tracking (preflight → done) - `agent_audit_log` — Every action, decision, and deviation logged ## Key Invariants (NEVER violate) 1. **Fixed-Point Math**: All probabilities stored as INTEGER × 1000. No floats in DB. 2. **Provenance**: All AI output is Level 5 (LLM Hypothesis). Human required for promotion. 3. **Hypothesis Confidence**: Conflict hypotheses are ALWAYS "low". Never auto-promote. 4. **Expert Override**: Once set, system cannot overwrite. Only human can change. 5. **Schema Version**: Every record carries `schema_version` tag. 6. **Companion Agent Isolation**: Companions cannot modify claims directly — they propose, humans approve.