nkshirsa commited on
Commit
88f66d8
Β·
verified Β·
1 Parent(s): 85dacf8

Update README with ECC Harness companion AI documentation

Browse files
Files changed (1) hide show
  1. README.md +169 -221
README.md CHANGED
@@ -8,6 +8,8 @@ tags:
8
  - structured-output
9
  - research-assistant
10
  - phd-tools
 
 
11
  language:
12
  - en
13
  base_model: Qwen/Qwen2.5-3B-Instruct
@@ -18,289 +20,235 @@ pipeline_tag: text-generation
18
 
19
  # PhD Research OS Brain 🧠
20
 
21
- **An AI model and complete software system for PhD-level STEM research**, implementing the [Research OS v11.0 (The Grounded OS)](https://github.com/nkshirsa/phd-research-os) specification.
22
 
23
  ## What This Is
24
 
25
- A **multi-task fine-tuned language model** (Qwen2.5-3B-Instruct + QLoRA) that serves as the intelligent core of a PhD Research OS β€” a system that ingests scientific papers, extracts structured claims, classifies evidence, detects contradictions, and generates research decisions.
26
 
27
- The model + Python package together implement **Phases 0–6** of the construction schedule:
28
- - βœ… Phase 0: Core data layer with full CRUD (22 unit tests passing)
29
- - βœ… Phase 1: Paper ingestion pipeline (PDF β†’ structured claims)
30
- - βœ… Phase 2: Evaluation harness with golden dataset support
31
- - βœ… Phase 3: Semantic search infrastructure (ChromaDB ready)
32
- - βœ… Phase 4: Obsidian export (one-directional, wiki-linked vault)
33
- - βœ… Phase 5: Conflict detection + Verifier Agent
34
- - βœ… Phase 6: Backup, batch processing, cost tracking, documentation
35
 
36
  ## Architecture
37
 
38
  ```
39
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
40
- β”‚ PhD Research OS β”‚
41
- β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
42
- β”‚ Pipeline Orchestrator (pipeline.py) β”‚
43
- β”‚ PDF β†’ Text β†’ Claims β†’ Conflicts β†’ Obsidian β”‚
44
- β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
45
- β”‚ AI Brain β”‚ 6 Agent Roles: β”‚
46
- β”‚(agents.pyβ”‚ 1. Researcher (claim extraction) β”‚
47
- β”‚ β”‚ 2. Epistemic Classifier β”‚
48
- β”‚ β”‚ 3. Confidence Scorer β”‚
49
- β”‚ β”‚ 4. Verifier (conflict detection) β”‚
50
- β”‚ β”‚ 5. Query Planner β”‚
51
- β”‚ β”‚ 6. Decision Generator β”‚
52
- β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
53
- β”‚ Data Layer (db.py) β€” SQLite + Fixed-Point Math β”‚
54
- β”‚ Claims | Sources | Goals | Conflicts | Decisions β”‚
55
- β”‚ Overrides | Experiments | API Usage | Calibration β”‚
56
- β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
57
- β”‚ Outputs: β”‚
58
- β”‚ Obsidian Vault | Streamlit Dashboard | JSON API β”‚
59
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 
 
60
  ```
61
 
62
- ## Research OS v11.0 Compliance
63
-
64
- This system adheres to the Grounded OS specification:
65
 
66
- | Rule | Implementation |
67
- |------|---------------|
68
- | **Provenance Hierarchy** | All AI outputs tagged as Level 5 (LLM Hypothesis). Human verification required to promote. |
69
- | **Anchor Divergence** | Agent output never overrides human-verified observations. Expert overrides lock confidence. |
70
- | **Shadow Archive** | Rejected claims stored in database with rejection reason. Resurrection requires 3+ independent observations. |
71
- | **Fixed-Point Math** | All probabilities stored as scaled integers (Γ—1000). No floating-point in DB. |
72
- | **Causal Lineage** | Every claim traces back to source DOI and extraction event. |
73
- | **Skeptic Thread** | Conflict detector finds contradictions in existing data only β€” no simulation or what-if. |
74
 
75
- ## 6 Core Tasks
76
 
77
- ### Task 1: Scientific Claim Extraction
78
  ```python
79
- from phd_research_os.agents import ResearchOSBrain
80
- brain = ResearchOSBrain(backend="api") # or "local" with fine-tuned model
81
- result = brain.extract_claims("Paper text here...")
82
- # Returns: {"claims": [{"text": "...", "epistemic_tag": "Fact", "confidence": 0.87, ...}]}
83
- ```
84
 
85
- ### Task 2: Epistemic Classification
86
- ```python
87
- result = brain.classify_epistemic("The measured ionic conductivity was 4.2 Γ— 10⁻⁴ S/cm at 25Β°C.")
88
- # Returns: {"epistemic_tag": "Fact", "reasoning": "...", "confidence_in_classification": 0.95}
89
- ```
90
 
91
- ### Task 3: Confidence Scoring
92
- ```python
93
- result = brain.score_confidence(
94
- "Graphene FET shows 45mV Dirac shift",
95
- journal="ACS Nano", study_type="primary_experimental", journal_tier=1)
96
- # Returns: {"confidence": 0.855, "evidence_strength": 0.9, "study_quality_weight": 1.0, ...}
97
- ```
98
 
99
- ### Task 4: Contradiction Detection
100
- ```python
101
- result = brain.detect_conflicts(
102
- "Sensitivity increases with ionic strength",
103
- "Sensitivity decreases with ionic strength")
104
- # Returns: {"conflict_detected": true, "conflict_type": "value_mismatch",
105
- # "hypothesis_confidence": "low", ...} # ALWAYS low
106
- ```
107
 
108
- ### Task 5: Query Decomposition
109
- ```python
110
- result = brain.decompose_query("What determines graphene biosensor sensitivity?")
111
- # Returns: {"sub_queries": ["...", "...", "..."], "reasoning": "..."}
112
- ```
113
 
114
- ### Task 6: Decision Object Generation
115
- ```python
116
- result = brain.generate_decision(
117
- goal="Achieve sub-fM detection limit",
118
- gaps=["Optimal aptamer not determined", "Debye screening unresolved"],
119
- low_confidence_claims=["CLM_0042: PEG length optimal (conf: 0.35)"])
120
- # Returns: {"decision_id": "DEC_0001", "recommended_action": "experiment",
121
- # "expected_information_gain": 0.72, ...}
122
  ```
123
 
124
- ## Quick Start
125
 
126
- ### 1. Install
 
 
 
 
 
 
127
 
128
- ```bash
129
- git clone https://huggingface.co/nkshirsa/phd-research-os-brain
130
- cd phd-research-os-brain
131
- pip install -r requirements.txt # or: pip install datasets httpx
132
- ```
133
-
134
- ### 2. Initialize Database
135
 
136
  ```python
137
- from phd_research_os.db import init_db, get_db, create_claim, create_goal
138
-
139
- init_db("data/research_os.db")
140
- conn = get_db("data/research_os.db")
141
-
142
- # Create your first research goal
143
- goal_id = create_goal(conn, "Achieve sub-femtomolar LOD for cardiac troponin", "high")
144
-
145
- # Add a claim manually
146
- claim_id = create_claim(conn,
147
- text="GFET sensitivity to cTnI was 45 mV/decade in 10mM PBS",
148
- epistemic_tag="Fact",
149
- confidence=0.85,
150
- evidence_strength=0.9,
151
- study_quality_weight=1.0,
152
- journal_tier_weight=1.0,
153
- completeness_penalty=1.0,
154
- source_doi="10.1234/example",
155
- parameters={"ionic_strength_mM": 10, "sensitivity_mV_dec": 45})
156
  ```
157
 
158
- ### 3. Process a Paper (with API brain)
159
 
160
- ```python
161
- from phd_research_os.agents import ResearchOSBrain
162
- from phd_research_os.pipeline import Pipeline
 
 
 
163
 
164
- # Use Claude/GPT as brain (or load fine-tuned model locally)
165
- brain = ResearchOSBrain(backend="api") # needs ANTHROPIC_API_KEY or OPENAI_API_KEY
166
- pipeline = Pipeline(brain=brain)
167
 
168
- result = pipeline.process_paper("path/to/paper.pdf", journal_tier=1, is_canonical=True)
169
- print(f"Extracted {result['claims_extracted']} claims")
170
- ```
 
 
171
 
172
- ### 4. Export to Obsidian
173
 
174
- ```python
175
- from phd_research_os.obsidian_export import ObsidianExporter
176
- exporter = ObsidianExporter(vault_path="my_vault")
177
- exporter.export_all()
178
- # Opens as linked Obsidian vault with Claims/, Sources/, Goals/, Dashboard.md
179
- ```
 
 
 
180
 
181
- ### 5. Detect Conflicts
182
 
 
183
  ```python
184
- from phd_research_os.conflict_detector import ConflictDetector
185
- detector = ConflictDetector(brain=brain)
186
- conflicts = detector.detect_conflicts()
187
- # All conflict hypotheses are tagged confidence="low" β€” human review required
188
  ```
189
 
190
- ## Training the Model
191
 
192
- The training dataset ([nkshirsa/phd-research-os-sft-data](https://huggingface.co/datasets/nkshirsa/phd-research-os-sft-data)) contains 1,900 multi-task examples across all 6 tasks.
193
 
194
- ```bash
195
- # On a GPU (T4 minimum, A10G recommended):
196
- pip install torch transformers trl peft datasets bitsandbytes accelerate trackio
197
- python train.py
198
- ```
199
 
200
- **Training Recipe:**
201
- - Base: `Qwen/Qwen2.5-3B-Instruct`
202
- - Method: QLoRA (4-bit NF4, r=64, all-linear targets)
203
- - LR: 2e-4 (cosine schedule, 5% warmup)
204
- - Batch: effective 16 (2 Γ— 8 gradient accumulation)
205
- - Epochs: 3
206
- - Loss: Assistant-only (masks system/user tokens)
207
- - References: [LoRA Without Regret (2025)](https://huggingface.co/docs/trl/lora_without_regret), [Multi-task Biomedical SFT (arxiv:2401.00579)](https://arxiv.org/abs/2401.00579)
 
 
208
 
209
- ## Confidence Formula
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
210
 
211
- All confidence scores follow this fixed-point formula:
212
 
213
  ```
214
- confidence = evidence_strength Γ— study_quality_weight Γ— journal_tier_weight Γ— completeness_penalty
215
-
216
- study_quality_weight:
217
- primary_experimental: 1.000
218
- meta_analysis: 1.000
219
- in_vitro: 0.800
220
- simulation: 0.600
221
- review_non_systematic:0.400
222
- case_study: 0.300
223
-
224
- journal_tier_weight:
225
- tier 1: 1.000
226
- tier 2: 0.850
227
- tier 3: 0.700
228
- preprint: 0.500
229
-
230
- completeness_penalty:
231
- all fields present: 1.000
232
- missing key fields: 0.700
233
  ```
234
 
235
- **Stored as scaled integers** (Γ—1000) per Research OS Rule 5. No floating-point in the database.
236
 
237
- ## File Structure
 
 
 
 
238
 
 
 
 
 
239
  ```
240
- phd-research-os-brain/
241
- β”œβ”€β”€ README.md # This file
242
- β”œβ”€β”€ train.py # SFT training script
243
- β”œβ”€β”€ generate_dataset.py # Synthetic dataset generator
244
- β”œβ”€β”€ phd_research_os/
245
- β”‚ β”œβ”€β”€ __init__.py
246
- β”‚ β”œβ”€β”€ db.py # Core data layer (Phase 0)
247
- β”‚ β”œβ”€β”€ agents.py # AI brain with 6 agent roles
248
- β”‚ β”œβ”€β”€ pipeline.py # Paper ingestion pipeline (Phase 1+6)
249
- β”‚ β”œβ”€β”€ obsidian_export.py # Obsidian vault export (Phase 4)
250
- β”‚ β”œβ”€β”€ evaluation.py # Golden dataset eval harness (Phase 2)
251
- β”‚ β”œβ”€β”€ conflict_detector.py # Contradiction detection (Phase 5)
252
- β”‚ └── backup.py # Backup & recovery (Phase 6)
253
- └── tests/
254
- └── test_db.py # 22 unit tests (all passing)
255
  ```
256
 
257
- ## Evaluation Harness
 
 
 
 
258
 
 
259
  ```python
260
- from phd_research_os.evaluation import run_regression_gate, create_golden_paper
261
-
262
- # Create golden dataset (you annotate 5 papers from your field)
263
- create_golden_paper("paper_001", "Graphene FET Biosensors", [
264
- {"text": "LOD was 1 fM in 10mM PBS", "epistemic_tag": "Fact", "confidence": 0.9},
265
- {"text": "Surface defects enhance sensitivity", "epistemic_tag": "Interpretation", "confidence": 0.6},
266
- # ... annotate all extractable claims
267
- ])
268
-
269
- # Run regression gate (must pass before ANY config change)
270
- result = run_regression_gate()
271
- print(f"Passed: {result.passed}")
272
- print(f"Failures: {result.failures}")
273
- # Thresholds: recall β‰₯70%, hallucination ≀10%, epistemic accuracy β‰₯60%
274
  ```
275
 
276
- ## Technology Stack
 
 
 
 
277
 
278
- | Layer | Choice | Rationale |
279
- |-------|--------|-----------|
280
- | Language | Python 3.11+ | Entire ML/NLP ecosystem native |
281
- | Database | SQLite + JSON | Single user, zero ops burden |
282
- | Vector Store | ChromaDB (planned) | Embedded, Python-native |
283
- | AI Brain | Qwen2.5-3B + QLoRA | Best structured output at size |
284
- | API Fallback | Claude / GPT-4o-mini | For immediate use before training |
285
- | Obsidian Sync | Direct filesystem | No plugin dependency |
286
- | Config | Git-versioned prompts | Every change regression-tested |
287
 
288
- ## What's NOT Built Yet (By Design)
289
 
290
- Per the construction schedule, these are **not yet implemented** β€” they require real PhD research data:
 
 
 
291
 
292
- - ❌ Bidirectional Obsidian sync (Phase 8 β€” frontmatter only)
293
- - ❌ Temporal decay on confidence (Phase 8 β€” needs calibration data)
294
- - ❌ Cognitive modes (Phase 9 β€” needs workflow data)
295
- - ❌ Causal graph (Phase 9 β€” manual entry only)
296
- - ❌ VLM figure extraction (Phase 9 β€” pilot)
297
- - ❌ Multi-agent peer review (Phase 10 β€” triggered by error rate)
298
- - ❌ Event sourcing (Phase 10 β€” triggered by need)
299
 
300
  ## Citation
301
 
302
- If you use this system in your research:
303
-
304
  ```bibtex
305
  @software{phd_research_os_2026,
306
  title={PhD Research OS Brain: Multi-Task AI for Scientific Research Management},
 
8
  - structured-output
9
  - research-assistant
10
  - phd-tools
11
+ - multi-agent
12
+ - ecc-harness
13
  language:
14
  - en
15
  base_model: Qwen/Qwen2.5-3B-Instruct
 
20
 
21
  # PhD Research OS Brain 🧠
22
 
23
+ **An AI model, companion agent framework, and complete software system for PhD-level STEM research**, implementing the [Research OS v11.0 (The Grounded OS)](https://github.com/nkshirsa/phd-research-os) specification with the ECC Harness (V-SINGULARITY) for continuous improvement.
24
 
25
  ## What This Is
26
 
27
+ Two systems in one:
28
 
29
+ 1. **The Brain** β€” A multi-task fine-tuned language model (Qwen2.5-3B-Instruct + QLoRA) that serves as the intelligent core: extracting claims from papers, classifying evidence, detecting contradictions, scoring confidence, and generating research decisions.
30
+
31
+ 2. **The Agent OS** β€” An ECC Harness orchestrator (`agent_os.py`) that lets you spawn **companion AI agents** to continuously improve the Brain. Each companion follows a strict lifecycle (preflight β†’ plan β†’ execute β†’ postflight), produces proposals that require human approval, and maintains an immutable audit trail.
 
 
 
 
 
32
 
33
  ## Architecture
34
 
35
  ```
36
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
37
+ β”‚ PhD Research OS β”‚
38
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
39
+ β”‚ Core Brain β”‚ Agent OS (ECC Harness) β”‚
40
+ β”‚ (agents.py) β”‚ (agent_os.py) β”‚
41
+ β”‚ β”‚ β”‚
42
+ β”‚ 6 Agent Roles: β”‚ Companion Agents: β”‚
43
+ β”‚ 1. Researcher β”‚ β€’ DataQualityAuditor β”‚
44
+ β”‚ 2. Epistemic β”‚ β€’ PromptOptimizer β”‚
45
+ β”‚ 3. Confidence β”‚ β€’ DomainExpander β”‚
46
+ β”‚ 4. Verifier β”‚ β€’ CalibrationAnalyst β”‚
47
+ β”‚ 5. Query Planner β”‚ β€’ CitationChaser β”‚
48
+ β”‚ 6. Decision Gen β”‚ β€’ [Custom agents] β”‚
49
+ β”‚ β”‚ β”‚
50
+ β”‚ Provenance: Lv5 β”‚ Output: Proposals (human approval) β”‚
51
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
52
+ β”‚ Data Layer (db.py) β€” SQLite + Fixed-Point Math β”‚
53
+ β”‚ Claims | Sources | Goals | Conflicts | Decisions β”‚
54
+ β”‚ Companions | Tasks | Proposals | Audit Log | Memory β”‚
55
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
56
+ β”‚ Pipeline (pipeline.py) β†’ Obsidian (obsidian_export.py) β”‚
57
+ β”‚ Evaluation (evaluation.py) β†’ Backup (backup.py) β”‚
58
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
59
  ```
60
 
61
+ ## ECC Harness β€” Companion AI System
 
 
62
 
63
+ The Agent OS implements the ECC Harness: Principal Architect Edition (V-SINGULARITY). This means any time you want to improve the Research OS, you spawn a governed companion agent:
 
 
 
 
 
 
 
64
 
65
+ ### Quick Start: Spawn a Companion
66
 
 
67
  ```python
68
+ from phd_research_os.agent_os import AgentOS
 
 
 
 
69
 
70
+ # Initialize the Agent OS
71
+ aos = AgentOS()
 
 
 
72
 
73
+ # Spawn a companion to audit data quality
74
+ agent_id = aos.spawn_companion("DataQualityAuditor")
 
 
 
 
 
75
 
76
+ # Assign it a task
77
+ task_id = aos.assign_task(agent_id, "Audit the last 50 claims for hallucination patterns")
 
 
 
 
 
 
78
 
79
+ # Run the full ECC lifecycle (preflight β†’ plan β†’ execute β†’ postflight)
80
+ result = aos.run_task(task_id)
81
+ print(f"Status: {result['status']}")
82
+ print(f"Proposals: {len(result['proposals'])}")
 
83
 
84
+ # Review proposals (human-in-the-loop)
85
+ for proposal in aos.get_proposals(agent_id):
86
+ print(f" [{proposal['proposal_type']}] {proposal['description']}")
87
+ # Approve or reject
88
+ aos.approve_proposal(proposal['proposal_id'], reviewed_by="Dr. Smith")
89
+ # OR: aos.reject_proposal(proposal['proposal_id'], "Not relevant", "Dr. Smith")
 
 
90
  ```
91
 
92
+ ### 5 Built-in Companion Types
93
 
94
+ | Agent Type | Purpose | How It Improves the Brain |
95
+ |-----------|---------|--------------------------|
96
+ | **DataQualityAuditor** | Audit claim extraction for drift and hallucination | Catches quality degradation over time |
97
+ | **PromptOptimizer** | A/B test system prompts against golden dataset | Improves recall, precision, accuracy |
98
+ | **DomainExpander** | Generate training data for new STEM fields | Expands model to new research areas |
99
+ | **CalibrationAnalyst** | Analyze Brier scores, fix miscalibration | Reduces over/under-confidence |
100
+ | **CitationChaser** | Find papers citing/contradicting current claims | Enriches knowledge base |
101
 
102
+ ### Custom Companions
 
 
 
 
 
 
103
 
104
  ```python
105
+ agent_id = aos.spawn_companion(
106
+ "custom",
107
+ purpose="Identify claims that need replication studies",
108
+ system_prompt="You are a Replication Analyst. Find claims with high confidence but few supporting sources..."
109
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  ```
111
 
112
+ ### ECC Lifecycle (Every Task)
113
 
114
+ ```
115
+ Β§1 PRE-FLIGHT β†’ Load ARCHITECTURE.md + AGENTS.md, validate DB, check agent state
116
+ Β§2 PLANNING β†’ Obviousness test, build step list, classify reversibility
117
+ Β§3 EXECUTION β†’ Bounded iterations (max 3), time budget with kill heuristic (50% over = HALT)
118
+ Β§4 POST-FLIGHT β†’ Validate proposals, check invariants, log meta-learning
119
+ ```
120
 
121
+ ### Key Safety Properties
 
 
122
 
123
+ - **Proposals, Not Actions**: Companions NEVER modify claims, sources, or goals directly. They produce Proposals that require human approval.
124
+ - **Immutable Audit Trail**: Every action logged to `agent_audit_log` table. Cannot be modified.
125
+ - **Kill Heuristic**: If a task exceeds its time budget by 50%, it auto-halts.
126
+ - **Iteration Budget**: Max 1 retry for patches, max 3 for architecture changes.
127
+ - **Harness Evolution**: The rules themselves can be amended via `propose_harness_evolution()` β€” but amendments require human approval.
128
 
129
+ ### State Files
130
 
131
+ | File | Purpose |
132
+ |------|---------|
133
+ | `ARCHITECTURE.md` | Project map β€” file locations, API config, invariants (read FIRST) |
134
+ | `AGENTS.md` | Agent registry β€” contracts, boundaries, proposal schema |
135
+ | `MEMORY.md` | Persistent assumptions with "Last Validated" markers |
136
+ | `plan.md` | Current task plan (mutable) |
137
+ | `HARNESS_EVOLUTION.md` | Rule amendment log (append-only) |
138
+
139
+ ## 6 Core Brain Tasks
140
 
141
+ *(Same as before β€” the Brain powers the core pipeline)*
142
 
143
+ ### Task 1: Scientific Claim Extraction
144
  ```python
145
+ from phd_research_os.agents import ResearchOSBrain
146
+ brain = ResearchOSBrain(backend="api")
147
+ result = brain.extract_claims("Paper text here...")
 
148
  ```
149
 
150
+ ### Task 2–6: Epistemic Classification, Confidence Scoring, Contradiction Detection, Query Decomposition, Decision Generation
151
 
152
+ See the [Core Brain section](#6-core-tasks-detail) below.
153
 
154
+ ## Research OS v11.0 Compliance
 
 
 
 
155
 
156
+ | Rule | Implementation |
157
+ |------|---------------|
158
+ | **Provenance Hierarchy** | All AI outputs = Level 5 (LLM Hypothesis). Human verification required. |
159
+ | **Anchor Divergence** | Agent output never overrides human-verified observations. |
160
+ | **Shadow Archive** | Rejected proposals stored with reason. Can be resurrected with quorum. |
161
+ | **Fixed-Point Math** | All probabilities stored as INTEGER Γ— 1000. No floats in DB. |
162
+ | **Causal Lineage** | Every claim traces to source DOI. Every proposal traces to agent_id + task_id. |
163
+ | **Skeptic Thread** | Conflict detector examines existing data only β€” no simulation. |
164
+
165
+ ## File Structure
166
 
167
+ ```
168
+ phd-research-os-brain/
169
+ β”œβ”€β”€ README.md # This file
170
+ β”œβ”€β”€ train.py # SFT training script
171
+ β”œβ”€β”€ generate_dataset.py # Synthetic dataset generator
172
+ β”œβ”€β”€ phd_research_os/
173
+ β”‚ β”œβ”€β”€ __init__.py # v1.0.0
174
+ β”‚ β”œβ”€β”€ db.py # Core data layer (Phase 0)
175
+ β”‚ β”œβ”€β”€ agents.py # AI brain β€” 6 agent roles
176
+ β”‚ β”œβ”€β”€ agent_os.py # ECC Harness β€” companion AI factory
177
+ β”‚ β”œβ”€β”€ pipeline.py # Paper ingestion (Phase 1+6)
178
+ β”‚ β”œβ”€β”€ obsidian_export.py # Obsidian vault export (Phase 4)
179
+ β”‚ β”œβ”€β”€ evaluation.py # Golden dataset eval (Phase 2)
180
+ β”‚ β”œβ”€β”€ conflict_detector.py # Contradiction detection (Phase 5)
181
+ β”‚ β”œβ”€β”€ backup.py # Backup & recovery (Phase 6)
182
+ β”‚ β”œβ”€β”€ ARCHITECTURE.md # Project map (Wake-Up doc)
183
+ β”‚ β”œβ”€β”€ AGENTS.md # Agent registry & contracts
184
+ β”‚ β”œβ”€β”€ MEMORY.md # Persistent state
185
+ β”‚ β”œβ”€β”€ plan.md # Current task plan
186
+ β”‚ └── HARNESS_EVOLUTION.md # Rule amendments
187
+ β”œβ”€β”€ tests/
188
+ β”‚ β”œβ”€β”€ test_db.py # 22 unit tests (data layer)
189
+ β”‚ └── test_agent_os.py # 21 integration tests (ECC harness)
190
+ ```
191
 
192
+ ## Test Results
193
 
194
  ```
195
+ tests/test_db.py β€” 22 passed βœ… (data layer, fixed-point math, CRUD, search)
196
+ tests/test_agent_os.py β€” 21 passed βœ… (spawn, lifecycle, proposals, audit, memory, evolution)
197
+ ─────────────────────────
198
+ Total: 43 tests passing
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
199
  ```
200
 
201
+ ## 6 Core Tasks (Detail)
202
 
203
+ ### Task 1: Scientific Claim Extraction
204
+ ```python
205
+ result = brain.extract_claims("Paper text here...")
206
+ # β†’ {"claims": [{"text": "...", "epistemic_tag": "Fact", "confidence": 0.87, ...}]}
207
+ ```
208
 
209
+ ### Task 2: Epistemic Classification
210
+ ```python
211
+ result = brain.classify_epistemic("The measured ionic conductivity was 4.2 Γ— 10⁻⁴ S/cm.")
212
+ # β†’ {"epistemic_tag": "Fact", "reasoning": "...", "confidence_in_classification": 0.95}
213
  ```
214
+
215
+ ### Task 3: Confidence Scoring
216
+ ```python
217
+ result = brain.score_confidence("Claim text", "ACS Nano", "primary_experimental", 1)
218
+ # β†’ {"confidence": 0.855, ...formula_breakdown...}
 
 
 
 
 
 
 
 
 
 
219
  ```
220
 
221
+ ### Task 4: Contradiction Detection
222
+ ```python
223
+ result = brain.detect_conflicts("Claim A", "Claim B")
224
+ # β†’ {"conflict_detected": true, "hypothesis_confidence": "low", ...} # ALWAYS low
225
+ ```
226
 
227
+ ### Task 5: Query Decomposition
228
  ```python
229
+ result = brain.decompose_query("Broad research question?")
230
+ # β†’ {"sub_queries": ["specific Q1", "specific Q2", ...]}
 
 
 
 
 
 
 
 
 
 
 
 
231
  ```
232
 
233
+ ### Task 6: Decision Generation
234
+ ```python
235
+ result = brain.generate_decision("Goal", ["gap1", "gap2"], ["low-conf claim 1"])
236
+ # β†’ {"recommended_action": "experiment", "expected_information_gain": 0.72, ...}
237
+ ```
238
 
239
+ ## Training
 
 
 
 
 
 
 
 
240
 
241
+ Dataset: [nkshirsa/phd-research-os-sft-data](https://huggingface.co/datasets/nkshirsa/phd-research-os-sft-data) β€” 1,900 multi-task examples
242
 
243
+ ```bash
244
+ pip install torch transformers trl peft datasets bitsandbytes accelerate trackio
245
+ python train.py # Needs GPU: T4 minimum, A10G recommended
246
+ ```
247
 
248
+ **Recipe:** Qwen2.5-3B-Instruct + QLoRA (r=64, all-linear) + assistant-only loss, 3 epochs, lr=2e-4
 
 
 
 
 
 
249
 
250
  ## Citation
251
 
 
 
252
  ```bibtex
253
  @software{phd_research_os_2026,
254
  title={PhD Research OS Brain: Multi-Task AI for Scientific Research Management},