# PhD Research OS — Architecture Map

> **WAKE-UP INSTRUCTION**: This file is the ground truth for all file locations,
> API configurations, and system topology. Read this FIRST before touching anything.

## System Topology

```
phd-research-os-brain/
├── ARCHITECTURE.md              ← YOU ARE HERE (project map)
├── AGENTS.md                    ← Agent role registry & contracts
├── MEMORY.md                    ← Persistent cross-session state
├── plan.md                      ← Current task plan (mutable)
├── HARNESS_EVOLUTION.md         ← ECC rule amendments log
│
├── train.py                     ← SFT training script (Qwen2.5-3B + QLoRA)
├── generate_dataset.py          ← Synthetic dataset generator (1900 examples)
│
├── phd_research_os/             ← CORE PACKAGE
│   ├── __init__.py              ← v1.0.0
│   ├── db.py                    ← SQLite data layer (Phase 0)
│   │                              Tables: claims, sources, goals, conflicts,
│   │                              decisions, overrides, experiments,
│   │                              api_usage_log, calibration_log, embedding_cache
│   │                              + companion_agents, agent_tasks, agent_audit_log
│   ├── agents.py                ← Original 6-role AI brain (ResearchOSBrain)
│   ├── agent_os.py              ← ECC HARNESS ORCHESTRATOR (companion AI factory)
│   │                              CompanionAgent lifecycle: spawn → preflight →
│   │                              plan → execute → postflight → retire
│   ├── pipeline.py              ← Paper ingestion (PDF → claims)
│   ├── obsidian_export.py       ← Obsidian vault export (one-directional)
│   ├── evaluation.py            ← Golden dataset eval harness + regression gate
│   ├── conflict_detector.py     ← Pairwise contradiction detection
│   └── backup.py                ← SQLite backup & restore
│
├── tests/
│   ├── test_db.py               ← 22 unit tests (data layer)
│   └── test_agent_os.py         ← ECC harness integration tests
│
└── output/
    └── research_os/
        └── db.py                ← Symlink/alias → phd_research_os/db.py
```

## API Configuration

| Provider | Env Variable | Default Model | Use Case |
|----------|-------------|---------------|----------|
| Anthropic | `ANTHROPIC_API_KEY` | claude-sonnet-4-20250514 | Primary brain + companion agents |
| OpenAI | `OPENAI_API_KEY` | gpt-4o-mini | Fallback |
| OpenRouter | `OPENROUTER_API_KEY` | (configurable) | Multi-model companion agents |
| HF Local | (model path) | nkshirsa/phd-research-os-brain | Fine-tuned local inference |

## Database Schema (db.py)

### Core Tables (Research OS)
- `claims` — Scientific claims with fixed-point confidence (×1000)
- `sources` — Paper metadata (DOI, journal tier, study type)
- `goals` — Research goals with priority ordering
- `conflicts` — Claim contradiction pairs (hypothesis_confidence ALWAYS "low")
- `decisions` — Proposed research actions with info gain scores
- `overrides` — Expert confidence overrides (lock mechanism)
- `experiments` — Lab data objects (manual approval required)
- `api_usage_log` — Cost tracking per API call
- `calibration_log` — Brier score data collection
- `embedding_cache` — Semantic cache (text_hash → embedding)

### ECC Harness Tables (agent_os.py)
- `companion_agents` — Registry of spawned companion AIs
- `agent_tasks` — Task lifecycle tracking (preflight → done)
- `agent_audit_log` — Every action, decision, and deviation logged

## Key Invariants (NEVER violate)

1. **Fixed-Point Math**: All probabilities stored as INTEGER × 1000. No floats in DB.
2. **Provenance**: All AI output is Level 5 (LLM Hypothesis). Human required for promotion.
3. **Hypothesis Confidence**: Conflict hypotheses are ALWAYS "low". Never auto-promote.
4. **Expert Override**: Once set, system cannot overwrite. Only human can change.
5. **Schema Version**: Every record carries `schema_version` tag.
6. **Companion Agent Isolation**: Companions cannot modify claims directly — they propose, humans approve.