Update README to v2.0 — 7-layer architecture, 143 tests, 87 blindspots
Browse files
README.md
CHANGED
|
@@ -10,6 +10,8 @@ tags:
|
|
| 10 |
- phd-tools
|
| 11 |
- multi-agent
|
| 12 |
- ecc-harness
|
|
|
|
|
|
|
| 13 |
language:
|
| 14 |
- en
|
| 15 |
base_model: Qwen/Qwen2.5-3B-Instruct
|
|
@@ -18,245 +20,141 @@ datasets:
|
|
| 18 |
pipeline_tag: text-generation
|
| 19 |
---
|
| 20 |
|
| 21 |
-
# PhD Research OS
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
│ Core Brain │ Agent OS (ECC Harness) │
|
| 40 |
-
│ (agents.py) │ (agent_os.py) │
|
| 41 |
-
│ │ │
|
| 42 |
-
│ 6 Agent Roles: │ Companion Agents: │
|
| 43 |
-
│ 1. Researcher │ • DataQualityAuditor │
|
| 44 |
-
│ 2. Epistemic │ • PromptOptimizer │
|
| 45 |
-
│ 3. Confidence │ • DomainExpander │
|
| 46 |
-
│ 4. Verifier │ • CalibrationAnalyst │
|
| 47 |
-
│ 5. Query Planner │ • CitationChaser │
|
| 48 |
-
│ 6. Decision Gen │ • [Custom agents] │
|
| 49 |
-
│ │ │
|
| 50 |
-
│ Provenance: Lv5 │ Output: Proposals (human approval) │
|
| 51 |
-
├──────────────────────┴──────────────────────────────────────┤
|
| 52 |
-
│ Data Layer (db.py) — SQLite + Fixed-Point Math │
|
| 53 |
-
│ Claims | Sources | Goals | Conflicts | Decisions │
|
| 54 |
-
│ Companions | Tasks | Proposals | Audit Log | Memory │
|
| 55 |
-
├─────────────────────────────────────────────────────────────┤
|
| 56 |
-
│ Pipeline (pipeline.py) → Obsidian (obsidian_export.py) │
|
| 57 |
-
│ Evaluation (evaluation.py) → Backup (backup.py) │
|
| 58 |
-
└─────────────────────────────────────────────────────────────┘
|
| 59 |
```
|
| 60 |
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
### Quick Start: Spawn a Companion
|
| 66 |
-
|
| 67 |
-
```python
|
| 68 |
-
from phd_research_os.agent_os import AgentOS
|
| 69 |
-
|
| 70 |
-
# Initialize the Agent OS
|
| 71 |
-
aos = AgentOS()
|
| 72 |
-
|
| 73 |
-
# Spawn a companion to audit data quality
|
| 74 |
-
agent_id = aos.spawn_companion("DataQualityAuditor")
|
| 75 |
-
|
| 76 |
-
# Assign it a task
|
| 77 |
-
task_id = aos.assign_task(agent_id, "Audit the last 50 claims for hallucination patterns")
|
| 78 |
-
|
| 79 |
-
# Run the full ECC lifecycle (preflight → plan → execute → postflight)
|
| 80 |
-
result = aos.run_task(task_id)
|
| 81 |
-
print(f"Status: {result['status']}")
|
| 82 |
-
print(f"Proposals: {len(result['proposals'])}")
|
| 83 |
-
|
| 84 |
-
# Review proposals (human-in-the-loop)
|
| 85 |
-
for proposal in aos.get_proposals(agent_id):
|
| 86 |
-
print(f" [{proposal['proposal_type']}] {proposal['description']}")
|
| 87 |
-
# Approve or reject
|
| 88 |
-
aos.approve_proposal(proposal['proposal_id'], reviewed_by="Dr. Smith")
|
| 89 |
-
# OR: aos.reject_proposal(proposal['proposal_id'], "Not relevant", "Dr. Smith")
|
| 90 |
```
|
| 91 |
|
| 92 |
-
##
|
| 93 |
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
-
|
| 105 |
-
agent_id = aos.spawn_companion(
|
| 106 |
-
"custom",
|
| 107 |
-
purpose="Identify claims that need replication studies",
|
| 108 |
-
system_prompt="You are a Replication Analyst. Find claims with high confidence but few supporting sources..."
|
| 109 |
-
)
|
| 110 |
-
```
|
| 111 |
|
| 112 |
-
|
| 113 |
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
```
|
| 120 |
|
| 121 |
-
|
| 122 |
|
| 123 |
-
|
| 124 |
-
- **Immutable Audit Trail**: Every action logged to `agent_audit_log` table. Cannot be modified.
|
| 125 |
-
- **Kill Heuristic**: If a task exceeds its time budget by 50%, it auto-halts.
|
| 126 |
-
- **Iteration Budget**: Max 1 retry for patches, max 3 for architecture changes.
|
| 127 |
-
- **Harness Evolution**: The rules themselves can be amended via `propose_harness_evolution()` — but amendments require human approval.
|
| 128 |
|
| 129 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 130 |
|
| 131 |
-
|
| 132 |
-
|------|---------|
|
| 133 |
-
| `ARCHITECTURE.md` | Project map — file locations, API config, invariants (read FIRST) |
|
| 134 |
-
| `AGENTS.md` | Agent registry — contracts, boundaries, proposal schema |
|
| 135 |
-
| `MEMORY.md` | Persistent assumptions with "Last Validated" markers |
|
| 136 |
-
| `plan.md` | Current task plan (mutable) |
|
| 137 |
-
| `HARNESS_EVOLUTION.md` | Rule amendment log (append-only) |
|
| 138 |
|
| 139 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
|
| 141 |
-
|
| 142 |
|
| 143 |
-
### Task 1: Scientific Claim Extraction
|
| 144 |
```python
|
| 145 |
-
from phd_research_os.
|
| 146 |
-
|
| 147 |
-
|
|
|
|
|
|
|
| 148 |
```
|
| 149 |
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
See the [Core Brain section](#6-core-tasks-detail) below.
|
| 153 |
|
| 154 |
-
##
|
| 155 |
|
| 156 |
-
|
| 157 |
-
|------|---------------|
|
| 158 |
-
| **Provenance Hierarchy** | All AI outputs = Level 5 (LLM Hypothesis). Human verification required. |
|
| 159 |
-
| **Anchor Divergence** | Agent output never overrides human-verified observations. |
|
| 160 |
-
| **Shadow Archive** | Rejected proposals stored with reason. Can be resurrected with quorum. |
|
| 161 |
-
| **Fixed-Point Math** | All probabilities stored as INTEGER × 1000. No floats in DB. |
|
| 162 |
-
| **Causal Lineage** | Every claim traces to source DOI. Every proposal traces to agent_id + task_id. |
|
| 163 |
-
| **Skeptic Thread** | Conflict detector examines existing data only — no simulation. |
|
| 164 |
|
| 165 |
-
##
|
| 166 |
-
|
| 167 |
-
```
|
| 168 |
-
phd-research-os-brain/
|
| 169 |
-
├── README.md # This file
|
| 170 |
-
├── train.py # SFT training script
|
| 171 |
-
├── generate_dataset.py # Synthetic dataset generator
|
| 172 |
-
├── phd_research_os/
|
| 173 |
-
│ ├── __init__.py # v1.0.0
|
| 174 |
-
│ ├── db.py # Core data layer (Phase 0)
|
| 175 |
-
│ ├── agents.py # AI brain — 6 agent roles
|
| 176 |
-
│ ├── agent_os.py # ECC Harness — companion AI factory
|
| 177 |
-
│ ├── pipeline.py # Paper ingestion (Phase 1+6)
|
| 178 |
-
│ ├── obsidian_export.py # Obsidian vault export (Phase 4)
|
| 179 |
-
│ ├── evaluation.py # Golden dataset eval (Phase 2)
|
| 180 |
-
│ ├── conflict_detector.py # Contradiction detection (Phase 5)
|
| 181 |
-
│ ├── backup.py # Backup & recovery (Phase 6)
|
| 182 |
-
│ ├── ARCHITECTURE.md # Project map (Wake-Up doc)
|
| 183 |
-
│ ├── AGENTS.md # Agent registry & contracts
|
| 184 |
-
│ ├── MEMORY.md # Persistent state
|
| 185 |
-
│ ├── plan.md # Current task plan
|
| 186 |
-
│ └── HARNESS_EVOLUTION.md # Rule amendments
|
| 187 |
-
├── tests/
|
| 188 |
-
│ ├── test_db.py # 22 unit tests (data layer)
|
| 189 |
-
│ └── test_agent_os.py # 21 integration tests (ECC harness)
|
| 190 |
-
```
|
| 191 |
|
| 192 |
-
|
| 193 |
|
| 194 |
-
|
| 195 |
-
tests/test_db.py — 22 passed ✅ (data layer, fixed-point math, CRUD, search)
|
| 196 |
-
tests/test_agent_os.py — 21 passed ✅ (spawn, lifecycle, proposals, audit, memory, evolution)
|
| 197 |
-
─────────────────────────
|
| 198 |
-
Total: 43 tests passing
|
| 199 |
-
```
|
| 200 |
|
| 201 |
-
|
| 202 |
|
| 203 |
-
##
|
| 204 |
-
```python
|
| 205 |
-
result = brain.extract_claims("Paper text here...")
|
| 206 |
-
# → {"claims": [{"text": "...", "epistemic_tag": "Fact", "confidence": 0.87, ...}]}
|
| 207 |
-
```
|
| 208 |
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
|
|
|
| 214 |
|
| 215 |
-
|
| 216 |
-
```python
|
| 217 |
-
result = brain.score_confidence("Claim text", "ACS Nano", "primary_experimental", 1)
|
| 218 |
-
# → {"confidence": 0.855, ...formula_breakdown...}
|
| 219 |
-
```
|
| 220 |
|
| 221 |
-
##
|
| 222 |
-
```python
|
| 223 |
-
result = brain.detect_conflicts("Claim A", "Claim B")
|
| 224 |
-
# → {"conflict_detected": true, "hypothesis_confidence": "low", ...} # ALWAYS low
|
| 225 |
-
```
|
| 226 |
|
| 227 |
-
### Task 5: Query Decomposition
|
| 228 |
-
```python
|
| 229 |
-
result = brain.decompose_query("Broad research question?")
|
| 230 |
-
# → {"sub_queries": ["specific Q1", "specific Q2", ...]}
|
| 231 |
```
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
|
|
|
|
|
|
| 237 |
```
|
| 238 |
|
| 239 |
## Training
|
| 240 |
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
```bash
|
| 244 |
-
pip install torch transformers trl peft datasets bitsandbytes accelerate trackio
|
| 245 |
-
python train.py # Needs GPU: T4 minimum, A10G recommended
|
| 246 |
-
```
|
| 247 |
-
|
| 248 |
-
**Recipe:** Qwen2.5-3B-Instruct + QLoRA (r=64, all-linear) + assistant-only loss, 3 epochs, lr=2e-4
|
| 249 |
|
| 250 |
-
|
| 251 |
|
| 252 |
-
|
| 253 |
-
@software{phd_research_os_2026,
|
| 254 |
-
title={PhD Research OS Brain: Multi-Task AI for Scientific Research Management},
|
| 255 |
-
author={nkshirsa},
|
| 256 |
-
year={2026},
|
| 257 |
-
url={https://huggingface.co/nkshirsa/phd-research-os-brain}
|
| 258 |
-
}
|
| 259 |
-
```
|
| 260 |
|
| 261 |
## License
|
| 262 |
|
|
|
|
| 10 |
- phd-tools
|
| 11 |
- multi-agent
|
| 12 |
- ecc-harness
|
| 13 |
+
- knowledge-graph
|
| 14 |
+
- calibrated-scoring
|
| 15 |
language:
|
| 16 |
- en
|
| 17 |
base_model: Qwen/Qwen2.5-3B-Instruct
|
|
|
|
| 20 |
pipeline_tag: text-generation
|
| 21 |
---
|
| 22 |
|
| 23 |
+
# PhD Research OS v2.0 — The Epistemic Engine 🧠
|
| 24 |
|
| 25 |
+
A complete, local-first AI system for PhD-level STEM research. Extracts epistemic-tagged claims from scientific papers, builds a knowledge graph with typed edges, detects contradictions, identifies research gaps, and scores confidence using code-computed formulas — not LLM guesses.
|
| 26 |
|
| 27 |
+
**53 files | 545KB | 143 tests passing | 87 blindspots audited and addressed**
|
| 28 |
|
| 29 |
+
## Resources
|
| 30 |
|
| 31 |
+
| Resource | URL | Description |
|
| 32 |
+
|----------|-----|-------------|
|
| 33 |
+
| **Model + Full Code** | [nkshirsa/phd-research-os-brain](https://huggingface.co/nkshirsa/phd-research-os-brain) | This repo — all code, design docs, tests |
|
| 34 |
+
| **Training Dataset** | [nkshirsa/phd-research-os-sft-data](https://huggingface.co/datasets/nkshirsa/phd-research-os-sft-data) | 1,900 multi-task examples across 6 tasks |
|
| 35 |
+
| **Taxonomy GUI** | [nkshirsa/phd-research-os-taxonomy](https://huggingface.co/spaces/nkshirsa/phd-research-os-taxonomy) | Live Gradio Space with 6 tabs |
|
| 36 |
+
| **Training Space** | [nkshirsa/phd-research-os-train](https://huggingface.co/spaces/nkshirsa/phd-research-os-train) | ZeroGPU micro-batch training |
|
| 37 |
+
| **Blindspot Audit** | [BLINDSPOT_AUDIT_COMPLETE.md](https://huggingface.co/nkshirsa/phd-research-os-brain/blob/main/BLINDSPOT_AUDIT_COMPLETE.md) | 87 failure modes across 4 epochs |
|
| 38 |
+
| **System Design** | [SYSTEM_DESIGN.md](https://huggingface.co/nkshirsa/phd-research-os-brain/blob/main/SYSTEM_DESIGN.md) | Complete 7-layer architecture spec |
|
| 39 |
|
| 40 |
+
## Quick Start
|
| 41 |
|
| 42 |
+
```bash
|
| 43 |
+
git clone https://huggingface.co/nkshirsa/phd-research-os-brain
|
| 44 |
+
cd phd-research-os-brain
|
| 45 |
+
pip install gradio pymupdf
|
| 46 |
+
python -m phd_research_os_v2.app
|
| 47 |
+
# Open http://localhost:7860
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
```
|
| 49 |
|
| 50 |
+
Works immediately with heuristic extraction. Add an API key for AI-powered extraction:
|
| 51 |
+
```bash
|
| 52 |
+
export ANTHROPIC_API_KEY=sk-... # or OPENAI_API_KEY
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
```
|
| 54 |
|
| 55 |
+
## Architecture
|
| 56 |
|
| 57 |
+
```
|
| 58 |
+
PDF Bundle → Layer 0 (Structural Parse) → Layer 1 (Entity Resolution)
|
| 59 |
+
→ Layer 2 (Qualified Extraction via AI Council)
|
| 60 |
+
→ Layer 3 (Claim Canonicalization)
|
| 61 |
+
→ Layer 4 (Knowledge Graph + Gap Analysis)
|
| 62 |
+
→ Layer 5 (Code-Computed Calibrated Scoring)
|
| 63 |
+
→ Layer 6 (Evaluation Harness)
|
| 64 |
+
→ Layer 7 (Provenance & Reproducibility)
|
| 65 |
+
→ Outputs: Obsidian Vault | Courtroom UI | Decision Objects
|
| 66 |
+
```
|
| 67 |
|
| 68 |
+
| Layer | Module | Purpose |
|
| 69 |
+
|-------|--------|---------|
|
| 70 |
+
| **0** | `layer0/parser.py` | PDF → section-aware regions with bbox, quality scores, cross-refs |
|
| 71 |
+
| **2** | `layer2/extractor.py` | AI Council extracts claims; Epistemic Separation Engine penalizes Abstract spin |
|
| 72 |
+
| **4** | `layer4/graph.py` | SQLite knowledge graph; typed edges; Gap Analysis finds structural holes |
|
| 73 |
+
| **5** | `layer5/scorer.py` | Code-computed 3-score system; parser confidence caps claims |
|
| 74 |
|
| 75 |
+
## The 3-Score System
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
+
The LLM **never** sets final confidence. It provides components. The code computes:
|
| 78 |
|
| 79 |
+
| Score | What It Measures |
|
| 80 |
+
|-------|-----------------|
|
| 81 |
+
| **Evidence Quality** | evidence × study_quality × journal_tier × completeness × section_modifier |
|
| 82 |
+
| **Truth Likelihood** | evidence_quality + corroboration - conflict_penalty - null_penalty |
|
| 83 |
+
| **Qualifier Strength** | 1.0 - qualifier_count×0.1 - null_penalty - inherited_penalty |
|
|
|
|
| 84 |
|
| 85 |
+
Key gates: parser confidence caps claims. Large N + tiny effect → capped. Abstract = 0.7× penalty.
|
| 86 |
|
| 87 |
+
## Epistemic Separation Engine
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
+
| Source Section | Confidence Modifier |
|
| 90 |
+
|---------------|-------------------|
|
| 91 |
+
| Results (with stats) | 1.0× |
|
| 92 |
+
| Abstract | 0.7× (forced to Interpretation) |
|
| 93 |
+
| Discussion | 0.75× |
|
| 94 |
|
| 95 |
+
## AI Model Council
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
|
| 97 |
+
| Member | Role |
|
| 98 |
+
|--------|------|
|
| 99 |
+
| **Query Planner** | Breaks questions into search queries |
|
| 100 |
+
| **Extractor** | Extracts atomic claims with epistemic tags + qualifiers |
|
| 101 |
+
| **Critic** | Reviews claims against source, flags errors |
|
| 102 |
+
| **Chairman** | Synthesizes final claims with 0.7 completeness penalty |
|
| 103 |
|
| 104 |
+
## ECC Harness — Companion AI System
|
| 105 |
|
|
|
|
| 106 |
```python
|
| 107 |
+
from phd_research_os.agent_os import AgentOS
|
| 108 |
+
aos = AgentOS()
|
| 109 |
+
agent = aos.spawn_companion("DataQualityAuditor")
|
| 110 |
+
task = aos.assign_task(agent, "Audit last 50 claims")
|
| 111 |
+
result = aos.run_task(task)
|
| 112 |
```
|
| 113 |
|
| 114 |
+
5 built-in types: DataQualityAuditor, PromptOptimizer, DomainExpander, CalibrationAnalyst, CitationChaser. All output goes through Proposals requiring human approval.
|
|
|
|
|
|
|
| 115 |
|
| 116 |
+
## Superpowers Skill Tree
|
| 117 |
|
| 118 |
+
7 skills enforcing Design → Plan → Execute → Verify: Brainstorming, Writing Plans, Git Worktrees, TDD, Systematic Debugging, Code Review, Security Review.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
|
| 120 |
+
## Meta-Improver AI
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
|
| 122 |
+
InternalMonitor (7 quality metrics) + ExternalScanner (arXiv, HF Hub, GitHub) + SelfReflector (learns from acceptance/rejection) + ImprovementEngine (ranked proposals).
|
| 123 |
|
| 124 |
+
## Quantum-Bio Taxonomy V2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
+
8-tier study types: in_vivo (1.0) → direct_physical_measurement (1.0) → mathematical_proof (0.95) → in_vitro (0.85) → first_principles_simulation (0.80) → phenomenological_simulation (0.60) → review (0.40) → perspective (0.20). 5 pre-built domains + custom.
|
| 127 |
|
| 128 |
+
## Blindspot Audit (87 findings)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
+
| Epoch | Focus | Count |
|
| 131 |
+
|-------|-------|-------|
|
| 132 |
+
| I: Architectural | Model & Inference | 10 |
|
| 133 |
+
| II: Epistemic | Logic & Truth | 27 |
|
| 134 |
+
| III: Judgment | Conflict & UI | 19 |
|
| 135 |
+
| IV: Systemic | Time & Impact | 25 |
|
| 136 |
|
| 137 |
+
81 addressed. 6 acknowledged as fundamental limitations. Full audit: [BLINDSPOT_AUDIT_COMPLETE.md](https://huggingface.co/nkshirsa/phd-research-os-brain/blob/main/BLINDSPOT_AUDIT_COMPLETE.md)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
|
| 139 |
+
## Tests
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
```
|
| 142 |
+
test_v2_integration.py — 24 ✅ (full pipeline)
|
| 143 |
+
test_db.py — 22 ✅ (data layer)
|
| 144 |
+
test_agent_os.py — 21 ✅ (ECC harness)
|
| 145 |
+
test_taxonomy.py — 27 ✅ (taxonomy)
|
| 146 |
+
test_skills_and_meta.py — 30 ✅ (skills + meta)
|
| 147 |
+
test_council.py — 19 ✅ (AI council)
|
| 148 |
+
Total: 143 passing
|
| 149 |
```
|
| 150 |
|
| 151 |
## Training
|
| 152 |
|
| 153 |
+
**ZeroGPU**: [nkshirsa/phd-research-os-train](https://huggingface.co/spaces/nkshirsa/phd-research-os-train) — set hardware to ZeroGPU, click Train repeatedly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 154 |
|
| 155 |
+
**Local** (needs GPU): `python train.py`
|
| 156 |
|
| 157 |
+
**Planned**: SFT → DPO → GRPO (epistemic rewards) → ConfTuner calibration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 158 |
|
| 159 |
## License
|
| 160 |
|