| # PhD Research OS β Architecture Map |
|
|
| > **WAKE-UP INSTRUCTION**: This file is the ground truth for all file locations, |
| > API configurations, and system topology. Read this FIRST before touching anything. |
|
|
| ## System Topology |
|
|
| ``` |
| phd-research-os-brain/ |
| βββ ARCHITECTURE.md β YOU ARE HERE (project map) |
| βββ AGENTS.md β Agent role registry & contracts |
| βββ MEMORY.md β Persistent cross-session state |
| βββ plan.md β Current task plan (mutable) |
| βββ HARNESS_EVOLUTION.md β ECC rule amendments log |
| β |
| βββ train.py β SFT training script (Qwen2.5-3B + QLoRA) |
| βββ generate_dataset.py β Synthetic dataset generator (1900 examples) |
| β |
| βββ phd_research_os/ β CORE PACKAGE |
| β βββ __init__.py β v1.0.0 |
| β βββ db.py β SQLite data layer (Phase 0) |
| β β Tables: claims, sources, goals, conflicts, |
| β β decisions, overrides, experiments, |
| β β api_usage_log, calibration_log, embedding_cache |
| β β + companion_agents, agent_tasks, agent_audit_log |
| β βββ agents.py β Original 6-role AI brain (ResearchOSBrain) |
| β βββ agent_os.py β ECC HARNESS ORCHESTRATOR (companion AI factory) |
| β β CompanionAgent lifecycle: spawn β preflight β |
| β β plan β execute β postflight β retire |
| β βββ pipeline.py β Paper ingestion (PDF β claims) |
| β βββ obsidian_export.py β Obsidian vault export (one-directional) |
| β βββ evaluation.py β Golden dataset eval harness + regression gate |
| β βββ conflict_detector.py β Pairwise contradiction detection |
| β βββ backup.py β SQLite backup & restore |
| β |
| βββ tests/ |
| β βββ test_db.py β 22 unit tests (data layer) |
| β βββ test_agent_os.py β ECC harness integration tests |
| β |
| βββ output/ |
| βββ research_os/ |
| βββ db.py β Symlink/alias β phd_research_os/db.py |
| ``` |
|
|
| ## API Configuration |
|
|
| | Provider | Env Variable | Default Model | Use Case | |
| |----------|-------------|---------------|----------| |
| | Anthropic | `ANTHROPIC_API_KEY` | claude-sonnet-4-20250514 | Primary brain + companion agents | |
| | OpenAI | `OPENAI_API_KEY` | gpt-4o-mini | Fallback | |
| | OpenRouter | `OPENROUTER_API_KEY` | (configurable) | Multi-model companion agents | |
| | HF Local | (model path) | nkshirsa/phd-research-os-brain | Fine-tuned local inference | |
|
|
| ## Database Schema (db.py) |
|
|
| ### Core Tables (Research OS) |
| - `claims` β Scientific claims with fixed-point confidence (Γ1000) |
| - `sources` β Paper metadata (DOI, journal tier, study type) |
| - `goals` β Research goals with priority ordering |
| - `conflicts` β Claim contradiction pairs (hypothesis_confidence ALWAYS "low") |
| - `decisions` β Proposed research actions with info gain scores |
| - `overrides` β Expert confidence overrides (lock mechanism) |
| - `experiments` β Lab data objects (manual approval required) |
| - `api_usage_log` β Cost tracking per API call |
| - `calibration_log` β Brier score data collection |
| - `embedding_cache` β Semantic cache (text_hash β embedding) |
| |
| ### ECC Harness Tables (agent_os.py) |
| - `companion_agents` β Registry of spawned companion AIs |
| - `agent_tasks` β Task lifecycle tracking (preflight β done) |
| - `agent_audit_log` β Every action, decision, and deviation logged |
|
|
| ## Key Invariants (NEVER violate) |
|
|
| 1. **Fixed-Point Math**: All probabilities stored as INTEGER Γ 1000. No floats in DB. |
| 2. **Provenance**: All AI output is Level 5 (LLM Hypothesis). Human required for promotion. |
| 3. **Hypothesis Confidence**: Conflict hypotheses are ALWAYS "low". Never auto-promote. |
| 4. **Expert Override**: Once set, system cannot overwrite. Only human can change. |
| 5. **Schema Version**: Every record carries `schema_version` tag. |
| 6. **Companion Agent Isolation**: Companions cannot modify claims directly β they propose, humans approve. |
|
|