PhD Research OS — Architecture Map

WAKE-UP INSTRUCTION: This file is the ground truth for all file locations, API configurations, and system topology. Read this FIRST before touching anything.

System Topology

phd-research-os-brain/
├── ARCHITECTURE.md              ← YOU ARE HERE (project map)
├── AGENTS.md                    ← Agent role registry & contracts
├── MEMORY.md                    ← Persistent cross-session state
├── plan.md                      ← Current task plan (mutable)
├── HARNESS_EVOLUTION.md         ← ECC rule amendments log
│
├── train.py                     ← SFT training script (Qwen2.5-3B + QLoRA)
├── generate_dataset.py          ← Synthetic dataset generator (1900 examples)
│
├── phd_research_os/             ← CORE PACKAGE
│   ├── __init__.py              ← v1.0.0
│   ├── db.py                    ← SQLite data layer (Phase 0)
│   │                              Tables: claims, sources, goals, conflicts,
│   │                              decisions, overrides, experiments,
│   │                              api_usage_log, calibration_log, embedding_cache
│   │                              + companion_agents, agent_tasks, agent_audit_log
│   ├── agents.py                ← Original 6-role AI brain (ResearchOSBrain)
│   ├── agent_os.py              ← ECC HARNESS ORCHESTRATOR (companion AI factory)
│   │                              CompanionAgent lifecycle: spawn → preflight →
│   │                              plan → execute → postflight → retire
│   ├── pipeline.py              ← Paper ingestion (PDF → claims)
│   ├── obsidian_export.py       ← Obsidian vault export (one-directional)
│   ├── evaluation.py            ← Golden dataset eval harness + regression gate
│   ├── conflict_detector.py     ← Pairwise contradiction detection
│   └── backup.py                ← SQLite backup & restore
│
├── tests/
│   ├── test_db.py               ← 22 unit tests (data layer)
│   └── test_agent_os.py         ← ECC harness integration tests
│
└── output/
    └── research_os/
        └── db.py                ← Symlink/alias → phd_research_os/db.py

API Configuration

Provider	Env Variable	Default Model	Use Case
Anthropic	`ANTHROPIC_API_KEY`	claude-sonnet-4-20250514	Primary brain + companion agents
OpenAI	`OPENAI_API_KEY`	gpt-4o-mini	Fallback
OpenRouter	`OPENROUTER_API_KEY`	(configurable)	Multi-model companion agents
HF Local	(model path)	nkshirsa/phd-research-os-brain	Fine-tuned local inference

Database Schema (db.py)

Core Tables (Research OS)

claims — Scientific claims with fixed-point confidence (×1000)
sources — Paper metadata (DOI, journal tier, study type)
goals — Research goals with priority ordering
conflicts — Claim contradiction pairs (hypothesis_confidence ALWAYS "low")
decisions — Proposed research actions with info gain scores
overrides — Expert confidence overrides (lock mechanism)
experiments — Lab data objects (manual approval required)
api_usage_log — Cost tracking per API call
calibration_log — Brier score data collection
embedding_cache — Semantic cache (text_hash → embedding)

ECC Harness Tables (agent_os.py)

companion_agents — Registry of spawned companion AIs
agent_tasks — Task lifecycle tracking (preflight → done)
agent_audit_log — Every action, decision, and deviation logged

Key Invariants (NEVER violate)

Fixed-Point Math: All probabilities stored as INTEGER × 1000. No floats in DB.
Provenance: All AI output is Level 5 (LLM Hypothesis). Human required for promotion.
Hypothesis Confidence: Conflict hypotheses are ALWAYS "low". Never auto-promote.
Expert Override: Once set, system cannot overwrite. Only human can change.
Schema Version: Every record carries schema_version tag.
Companion Agent Isolation: Companions cannot modify claims directly — they propose, humans approve.