Integrate BDH (Baby Dragon Hatchling) into requirements — interpretable verifier, Hebbian memory, domain composability, sparsity monitoring

Browse files

Files changed (1) hide show

REQUIREMENTS_FROM_SOURCES.md +1 -407

REQUIREMENTS_FROM_SOURCES.md CHANGED Viewed

@@ -1,407 +1 @@
-# PhD Research OS — Requirements Derived from Sources
-## Grounded Entirely in the Emergence Transformer Paper + Awesome Open Source AI List
-**Written so a high school student can understand every word.**
-**Sources**:
-1. 📄 [Emergence Transformer: Dynamical Temporal Attention Matters](https://arxiv.org/abs/2604.19816) — A paper that redesigns the Transformer's attention mechanism so components interact with their own past states through time-varying queries, keys, and values. Shows that neighbor-attention promotes coherence while self-attention has an optimal sweet spot, and applies this to social opinion models and Hopfield networks for continual learning without forgetting.
-2. 📦 [Awesome Open Source AI](https://github.com/alvinreal/awesome-opensource-ai) — A curated list of 500+ battle-tested, production-proven open-source AI tools across 14 categories (April 2026).
----
-## What These Sources Tell Us
-### From the Emergence Transformer Paper
-The paper introduces **Dynamical Temporal Attention (DTA)** — a version of the Transformer where the query, key, and value matrices change over time. The key insights that apply to our Research OS:
-1. **Neighbor-DTA vs Self-DTA**: When components pay attention to their neighbors' history, coherence (agreement) always increases. When they pay attention to their OWN history, there's an optimal attention weight — too much self-attention actually hurts. This directly maps to our AI Council: council members should attend to EACH OTHER'S reasoning (neighbor-DTA), not just their own previous outputs (self-DTA).
-2. **Emergent Continual Learning**: The paper shows DTA applied to Hopfield neural networks achieves continual learning WITHOUT catastrophic forgetting. This is exactly what our Research OS needs — the model should learn from new papers without forgetting what it learned from old ones.
-3. **Social Coherence Modulation**: DTA can either enhance agreement or preserve plurality in social opinion models. For our system, this means the AI Council should be designable — we can tune it to either push toward consensus (for clear-cut cases) or deliberately preserve disagreement (for genuinely ambiguous cases).
-4. **Time-Varying Attention Kernels**: Standard Transformers have fixed attention patterns. DTA makes attention evolve over time. For our system, this means: as the model processes more of a paper, its attention to earlier sections should change. Reading the Discussion should update how the model interprets the Abstract.
-### From the Awesome Open Source AI List
-The list catalogs the production-ready tools that exist today. Here are the specific tools relevant to each part of our system, organized by what they replace or enable:
----
-## Requirements by System Layer
-### Layer 0: PDF Parsing — Replace Basic Scrapers with ML Parsers
-**Current state**: PyMuPDF/pdfplumber (basic text extraction)
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[Marker](https://github.com/datalab-to/marker)** | Fast, accurate PDF-to-markdown with table extraction and equation handling | Replaces pdfplumber — preserves document structure, tables, equations |
-| **[MinerU](https://github.com/opendatalab/MinerU)** | High-accuracy PDF parsing with VLM+OCR dual engine | Handles scanned papers and complex layouts that Marker misses |
-| **[Docling](https://github.com/docling-project/docling)** | Document processing toolkit for GenAI workflows | Backup parser for non-standard formats (Word, PPT, Excel supplements) |
-| **[Unstructured](https://github.com/Unstructured-IO/unstructured)** | Best-in-class document preprocessing | Universal fallback for any document type |
-| **[MarkItDown](https://github.com/microsoft/markitdown)** | Microsoft's file-to-Markdown converter | Handles supplementary files (Excel data, PowerPoint presentations) |
-| **[OmniParse](https://github.com/adithya-s-k/omniparse)** | Parses documents, tables, images, videos, audio, web pages | Multi-modal supplement handling (video supplements, audio recordings) |
-**Requirement P-REQ-1**: Integrate Marker as the primary parser. Fall back to MinerU for scanned/OCR documents. Use Docling/MarkItDown for non-PDF supplements.
-**Requirement P-REQ-2**: Use Chonkie ([chonkie-inc/chonkie](https://github.com/chonkie-inc/chonkie)) for intelligent document chunking. It supports semantic, token, and recursive chunking strategies — replacing the current simple section-merge chunking in parser.py.
----
-### Layer 1: Entity Resolution — Add Embedding-Based Matching
-**Current state**: No embedding model, no entity normalization
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[BGE / FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)** | Best-in-class text embeddings | Convert claims to vectors for semantic matching instead of word overlap |
-| **[FastEmbed](https://github.com/qdrant/fastembed)** | Lightweight embedding with ONNX Runtime, no GPU needed | Local-first embedding that runs on CPU for privacy |
-| **[sqlite-vec](https://github.com/asg017/sqlite-vec)** | Vector search as a SQLite extension | Adds vector similarity to our existing SQLite database — zero new infrastructure |
-| **[MTEB](https://github.com/embeddings-benchmark/mteb)** | Embedding benchmark | Choose the best embedding model for scientific text by testing on MTEB |
-**Requirement M-REQ-1**: Replace Jaccard word overlap in `canonicalizer.py` with embedding-based cosine similarity using FastEmbed + sqlite-vec. This keeps the system local-first (SQLite + CPU embeddings) while enabling semantic deduplication.
-**Requirement M-REQ-2**: Use MTEB to benchmark which embedding model performs best on scientific claim similarity before committing to one.
----
-### Layer 2: Extraction — Better Models and Constrained Output
-**Current state**: Qwen2.5-3B with mock fallback, no output guarantees
-**Required models from the awesome list**:
-| Model | Why It's Better |
-|-------|----------------|
-| **[Qwen3.6-Plus](https://github.com/QwenLM/Qwen)** | April 2026 flagship, 1M context window, competitive with Claude 4.5 Opus |
-| **[Kimi K2.5](https://github.com/MoonshotAI/Kimi-K2.5)** | 256K context, strong reasoning, native tool-use for agentic workflows |
-| **[Phi-4](https://github.com/microsoft/PhiCookBook)** | Small but highly capable for reasoning and edge/on-device inference |
-| **[OLMo 2](https://github.com/allenai/OLMo)** | Fully open-source (data + code + logs) — by scientists, for scientists |
-| **[GLM-5](https://github.com/zai-org/GLM-5)** | Strong coding, reasoning, and agentic-task performance |
-**Requirement B-REQ-1**: Upgrade the primary brain to Qwen3.6-Plus (or its quantized variant) for maximum reasoning quality. Use Phi-4 as the local/edge fallback for 16GB VRAM deployment.
-**Requirement B-REQ-2**: Use the **Instructor** library ([jxnl/instructor](https://github.com/jxnl/instructor)) for structured output extraction with Pydantic validation. This replaces the need for the Guidance library — Instructor handles validation, retries, and error handling for extracting claims as structured JSON from any LLM.
-**From the Emergence Transformer paper — Requirement B-REQ-3**: Implement **Dynamical Temporal Attention** in the council architecture:
-The paper shows that **neighbor-DTA consistently promotes coherence** while **self-DTA has an optimal weight**. Applied to the AI Council:
-- **Neighbor-DTA for council members**: Each council member (Extractor, Critic, Chairman) should attend to OTHER members' reasoning history, not just their own. This promotes convergence on genuinely shared insights.
-- **Self-DTA with tunable weight (α)**: Each member also attends to their OWN past outputs, but with a tunable weight. The paper proves there's an optimal α — too much self-attention causes overconfidence. Too little means no memory.
-- **Practical implementation**: Store each council member's outputs across multiple papers. When processing paper N, the Extractor can attend to its OWN extractions from papers 1…N-1 (self-DTA) and to the Critic's feedback from papers 1…N-1 (neighbor-DTA). The attention weights are tunable per task.
-This turns the council from a stateless sequential pipeline into a **stateful attention-based ensemble** where members learn from each other's history.
----
-### Layer 3: Deduplication — Semantic Matching at Scale
-**Current state**: Jaccard word overlap
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[sqlite-vec](https://github.com/asg017/sqlite-vec)** | Vector search in SQLite | Find duplicate claims by meaning, not word overlap, inside our existing DB |
-| **[Chroma](https://github.com/chroma-core/chroma)** | Embedding database | If claim count exceeds SQLite performance, scale to dedicated vector DB |
-| **[rerankers](https://github.com/AnswerDotAI/rerankers)** | Unified reranking API | After finding candidate duplicates by embedding, use cross-encoder reranking for precision |
-**Requirement M-REQ-3**: Implement two-stage deduplication: (1) Fast approximate matching via sqlite-vec embeddings (recall-optimized), (2) Precise reranking via cross-encoder for candidate pairs (precision-optimized).
----
-### Layer 4: Knowledge Graph — Add Temporal Reasoning and Graph RAG
-**Current state**: SQLite adjacency list, word-overlap conflict detection
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[Graphiti](https://github.com/getzep/graphiti)** | Real-time temporal knowledge graphs with provenance tracking | Tracks how facts change over time — exactly what we need for claim versioning |
-| **[GraphRAG](https://github.com/microsoft/graphrag)** | Knowledge-graph-based retrieval | Enables multi-hop reasoning over the claim graph |
-| **[LightRAG](https://github.com/HKUDS/LightRAG)** | Graph-based RAG with dual-level retrieval | Simpler alternative to GraphRAG for our scale |
-| **[KAG (OpenSPG)](https://github.com/OpenSPG/KAG)** | Knowledge Augmented Generation for logical reasoning | Schema-constrained knowledge construction for professional domains |
-**From the Emergence Transformer paper — Requirement G-REQ-1**: Apply DTA to the knowledge graph for **emergent conflict detection**:
-The paper models N coupled oscillators where coherence emerges from attention-mediated interactions. Claims in the knowledge graph are analogous to oscillators:
-- Each claim has a "phase" (its epistemic state: Fact/Interpretation/Hypothesis and confidence)
-- Claims interact through graph edges (supports/refutes/extends)
-- **Neighbor-DTA** on the graph: When scoring a claim, attend to the HISTORY of its graph neighbors. A claim that was "Interpretation" but whose supporting claims have all been upgraded to "Fact" over time should be reconsidered.
-- **Conflict detection as coherence breakdown**: The paper's order parameter (r_t) measures global coherence. In our graph, sudden drops in local coherence (a cluster of claims that were previously consistent suddenly becoming contradictory because of a new paper) are analogous to desynchronization events. These should trigger alerts.
-**Requirement G-REQ-2**: Use Graphiti for temporal provenance. Every claim stores when it was first extracted, when it was last confirmed by a new source, and when it was contradicted. The graph should answer queries like "What changed about LOD claims for GFET sensors between 2022 and 2025?"
----
-### Layer 5: Scoring — Add Calibration Infrastructure
-**Current state**: Fixed-point formula works, calibration only planned
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[DeepEval](https://github.com/confident-ai/deepeval)** | LLM evaluation with hallucination detection, bias detection | Automated checking of extraction quality and confidence calibration |
-| **[RAGAs](https://github.com/explodinggradients/ragas)** | RAG evaluation (faithfulness, relevance, context recall) | Evaluate whether extracted claims are faithful to source text |
-**Requirement S-REQ-1**: Use DeepEval's faithfulness metric to automatically check: "Does this extracted claim actually appear in the source text?" This replaces manual gold-standard checking for high-volume papers.
-**Requirement S-REQ-2**: Use RAGAs for end-to-end pipeline evaluation — measure whether the system retrieves the right evidence and generates faithful extractions.
----
-### Layer 6: Evaluation — Build a Real Test Suite
-**Current state**: Counts distributions, no ground-truth comparison
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)** | De-facto standard for model evaluation | Standardized evaluation across model versions |
-| **[DeepEval](https://github.com/confident-ai/deepeval)** | "Pytest for LLMs" | Unit-test each extraction task with pass/fail criteria |
-| **[Promptfoo](https://github.com/promptfoo/promptfoo)** | LLM testing and red-teaming | Systematic prompt testing, side-by-side model comparison |
-| **[Inspect AI](https://github.com/UKGovernmentBEIS/inspect_ai)** | UK AI Safety Institute's evaluation framework | Multi-turn dialog evaluation with tool use |
-| **[Lighteval](https://github.com/huggingface/lighteval)** | Lightweight model evaluation | Quick evaluation during training |
-**Requirement T-REQ-1**: Implement DeepEval-based unit tests for each extraction task. Each test has:
-- Input: paper excerpt
-- Expected output: correct claims with correct tags
-- Pass criteria: extraction recall ≥ 70%, epistemic accuracy ≥ 60%, qualifier preservation ≥ 80%
-**Requirement T-REQ-2**: Use Promptfoo for prompt regression testing. Every time a system prompt changes, automatically compare outputs before and after.
----
-### Training Pipeline — Use Production Frameworks
-**Current state**: Custom train.py with TRL SFTTrainer, ZeroGPU micro-batching
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[TRL](https://github.com/huggingface/trl)** | Official SFT, DPO, GRPO, PPO | Already used for SFT — extend to DPO and GRPO stages |
-| **[Axolotl](https://github.com/axolotl-ai-cloud/axolotl)** | YAML-driven SFT, DPO, GRPO pipeline | Simpler configuration than custom scripts |
-| **[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)** | One-stop SFT, DPO, ORPO with web UI | GUI for non-programmers to run training |
-| **[Unsloth](https://github.com/unslothai/unsloth)** | 2× faster, 70% less memory fine-tuning | Makes training feasible on consumer GPUs |
-| **[OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)** | Scalable RLHF with PPO, GRPO, REINFORCE++ | For the GRPO stage with custom epistemic reward functions |
-| **[verl](https://github.com/volcengine/verl)** | ByteDance's RL for LLMs with PPO, GRPO | Alternative GRPO implementation |
-| **[PEFT](https://github.com/huggingface/peft)** | Parameter-efficient fine-tuning (LoRA, etc.) | Already used — continue with LoRA |
-**Requirement TR-REQ-1**: Replace ZeroGPU micro-batching with a single continuous training job using Unsloth (for 2× speedup on consumer GPU) or Axolotl (for YAML-driven pipeline).
-**Requirement TR-REQ-2**: Implement the 4-stage pipeline:
-1. **SFT** via TRL/Unsloth (already partially built)
-2. **DPO** via TRL DPOTrainer on preference pairs
-3. **GRPO** via OpenRLHF or verl with the 3 custom reward functions (JSON validity, schema compliance, qualifier preservation)
-4. **ConfTuner** via custom training loop with tokenized Brier score loss
-**From the Emergence Transformer paper — Requirement TR-REQ-3**: Implement **DTA-inspired continual learning** for domain adaptation:
-The paper demonstrates that DTA applied to Hopfield networks achieves continual learning without catastrophic forgetting. For our model:
-- When fine-tuning on a new scientific domain (e.g., adding ecology to a biosensors-trained model), use the DTA principle: the model should attend to its OWN past activations (self-DTA) to remember old domains while learning new ones.
-- Practically: this maps to **O-LoRA** (orthogonal LoRA) — training new LoRA adapters in orthogonal subspaces so they don't interfere with existing adapters. The DTA paper provides the theoretical foundation for WHY this works: self-attention on past states preserves memory while neighbor-attention on new data drives learning.
----
-### Synthetic Data Generation
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[Distilabel](https://github.com/argilla-io/distilabel)** | Synthetic data generation and distillation | Generate 10K+ training examples using teacher models |
-| **[Argilla](https://github.com/argilla-io/argilla)** | Data annotation and human-in-the-loop | Label real paper extractions with human experts |
-**Requirement D-REQ-1**: Use Distilabel to generate the teacher ensemble outputs. Run 3-5 teacher models (Qwen3.6-Plus, Kimi K2.5, GLM-5) on 100 real papers and store ALL outputs with disagreement signals.
-**Requirement D-REQ-2**: Use Argilla for human expert labeling of the gold standard test set (10 papers, every claim manually annotated).
----
-### Data Quality and Labeling
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[Cleanlab](https://github.com/cleanlab/cleanlab)** | Find and fix label errors in datasets | Detect mislabeled training examples automatically |
-| **[Great Expectations](https://github.com/great-expectations/great_expectations)** | Data validation for pipelines | Validate that every training example has required fields, valid JSON, correct tag |
-| **[Label Studio](https://github.com/HumanSignal/label-studio)** | Multi-type data labeling | Interface for human annotators to label paper excerpts |
-**Requirement D-REQ-3**: Run Cleanlab on the existing 1,900 training examples to detect any mislabeled examples (wrong epistemic tags, missing qualifiers).
-**Requirement D-REQ-4**: Use Great Expectations to validate every training example before it enters the training pipeline: valid JSON, tag in allowed set, confidence in [0,1], non-empty source quote.
----
-### Inference Serving — Replace Mock with Real AI
-**Current state**: No model serving, everything runs through optional API calls
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[Ollama](https://github.com/ollama/ollama)** | Simplest local LLM serving | One-command model serving on consumer hardware |
-| **[vLLM](https://github.com/vllm-project/vllm)** | High-throughput LLM serving | Fast batch processing of many paper sections |
-| **[llama.cpp](https://github.com/ggml-org/llama.cpp)** | CPU/GPU inference for quantized models | Run on laptops without dedicated GPU |
-| **[SGLang](https://github.com/sgl-project/sglang)** | Fast structured generation | Guaranteed valid JSON output via grammar-constrained decoding |
-**Requirement I-REQ-1**: Integrate Ollama as the default local model server. One command to start: `ollama pull qwen3.6-plus:q4` → model available at `http://localhost:11434`.
-**Requirement I-REQ-2**: Use SGLang for constrained decoding — it guarantees valid JSON output with valid enum values. This eliminates broken JSON, invalid tags, and mixed text/JSON output.
----
-### AI Safety and Security
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[Guardrails AI](https://github.com/guardrails-ai/guardrails)** | Input/output validation for LLMs | Validate extraction outputs match expected schema |
-| **[LLM Guard](https://github.com/protectai/llm-guard)** | Security toolkit for LLM interactions | Detect prompt injection if system is exposed as API |
-| **[Garak](https://github.com/NVIDIA/garak)** | LLM vulnerability scanner | Test model for hallucination patterns specific to scientific claims |
-| **[DeepTeam](https://github.com/confident-ai/deepteam)** | Red teaming framework | Adversarial testing of extraction robustness |
-**Requirement SEC-REQ-1**: Use Guardrails AI to validate every LLM output before it enters the database. Schema validation, tag validation, confidence range checking.
-**Requirement SEC-REQ-2**: Use Garak to scan the fine-tuned model for scientific hallucination patterns. Test: does the model invent statistics? Does it fabricate citations? Does it claim certainty where the paper was uncertain?
----
-### Interpretability
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[TransformerLens](https://github.com/TransformerLensOrg/TransformerLens)** | Mechanistic interpretability | Understand WHICH attention heads are responsible for qualifier detection vs claim extraction |
-| **[Captum](https://github.com/pytorch/captum)** | PyTorch interpretability | Attribution analysis — which input tokens influenced each output |
-**From the Emergence Transformer paper — Requirement INT-REQ-1**: Use the paper's DTA attention kernel analysis to interpret the fine-tuned model:
-The paper derives explicit formulas for how attention weights evolve over time (Equations 9-10). After fine-tuning, we can:
-- Visualize which tokens in the input (qualifier words like "may," "suggests") have the highest attention weight when the model outputs epistemic tags
-- Track whether attention to the Abstract section decreases when the model has already processed the Results section (temporal attention shift)
-- Identify attention heads that specialize in specific tasks (one head for qualifier detection, another for statistical parsing) — this validates whether the model has learned task-specific representations, answering the "specialist heads" question empirically
----
-### MLOps and Monitoring
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[MLflow](https://github.com/mlflow/mlflow)** | Experiment tracking, model registry | Track every training run, compare model versions |
-| **[Weights & Biases (wandb)](https://github.com/wandb/wandb)** | Experiment tracking with visualization | Dashboard for training metrics across all 4 stages |
-| **[DVC](https://github.com/iterative/dvc)** | Data and model versioning | Version the training dataset and gold standard |
-| **[Evidently](https://github.com/evidentlyai/evidently)** | ML monitoring and observability | Detect model drift in production |
-| **[Phoenix](https://github.com/Arize-ai/phoenix)** | AI observability | Monitor extraction quality in real-time |
-**Requirement OPS-REQ-1**: Use MLflow or W&B to track all training experiments. Every training run logs: loss curves, evaluation metrics, model checkpoints, hyperparameters, dataset version.
-**Requirement OPS-REQ-2**: Use Evidently for drift detection. Weekly check: run the model on the gold standard test set. If any metric drops >5%, alert.
----
-### Agent Framework — Connect Real Brains to Agent Bodies
-**Current state**: Full agent lifecycle works, but agents have no AI model connected
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[smolagents](https://github.com/huggingface/smolagents)** | Lightweight agent framework from HuggingFace | Simpler than current custom AgentOS for basic tasks |
-| **[LangGraph](https://github.com/langchain-ai/langgraph)** | Stateful, multi-actor agent orchestration | For the multi-agent council with memory |
-| **[CrewAI](https://github.com/crewAIInc/crewAI)** | Multi-agent collaboration framework | Define roles (Extractor, Critic, Chairman) with collaboration protocols |
-| **[Letta (MemGPT)](https://github.com/letta-ai/letta)** | Stateful agents with persistent memory | Agents that remember across sessions |
-| **[Mem0](https://github.com/mem0ai/mem0)** | Universal memory layer for agents | Persistent memory for the MetaImprover and CitationChaser agents |
-**From the Emergence Transformer paper — Requirement A-REQ-1**: Implement the AI Council as a **DTA-coupled multi-agent system**:
-The paper's model has N oscillators coupled through an adjacency matrix A_ij. The AI Council has N=4 members coupled through information sharing. The DTA framework tells us:
-- **Coupling topology matters**: The paper shows that different network structures (fully connected, small-world, scale-free) produce different coherence patterns. For 4 council members, fully connected (everyone sees everyone) promotes maximum consensus. Star topology (Chairman sees all, others only see Chairman) preserves more diversity.
-- **α parameter tunes consensus vs diversity**: At α=0, no temporal attention �� members are stateless → pure diversity. At α=1, full temporal attention → members converge → pure consensus. The paper proves there's an optimal α between 0 and 1 for maximum USEFUL coherence. Tune this per task: high α for clear-cut Fact/Interpretation decisions, low α for ambiguous Conflict_Hypothesis cases.
-- **β parameter controls memory decay**: In the paper, β determines how fast old attention information decays. For the council, β controls how much members remember from previous papers. High β = short memory (each paper is fresh). Low β = long memory (patterns from 100 papers ago still influence decisions).
----
-### UI and User Experience
-**Required tools from the awesome list**:
-| Tool | What It Does | Why We Need It |
-|------|-------------|----------------|
-| **[Gradio](https://github.com/gradio-app/gradio)** | Already used | Continue using for main UI |
-| **[Kotaemon](https://github.com/Cinnamon/kotaemon)** | RAG-based document chat with Gradio UI | Reference implementation for document Q&A interface |
-| **[Open Notebook](https://github.com/lfnovo/open-notebook)** | AI-powered notebook with multi-modal support | Model for how to build the Obsidian-like research interface |
-**Requirement UI-REQ-1**: Study Kotaemon's architecture for the "chat with your papers" interface. It has hybrid RAG, re-ranking, and multi-modal support — exactly what the Research OS courtroom UI needs.
----
-## Summary: The Complete Requirements Stack
-| Layer | Current Tool | Required Tool(s) | Source |
-|-------|-------------|------------------|--------|
-| PDF Parsing | pdfplumber/PyMuPDF | **Marker** + MinerU + Docling | Awesome List §5 |
-| Chunking | Custom section merger | **Chonkie** | Awesome List §5 |
-| Embeddings | None | **FastEmbed** + sqlite-vec | Awesome List §5 |
-| Deduplication | Jaccard overlap | FastEmbed + **rerankers** | Awesome List §5 |
-| Base Model | Qwen2.5-3B | **Qwen3.6-Plus** / Phi-4 | Awesome List §2 |
-| Structured Output | Hope-for-JSON | **Instructor** / SGLang | Awesome List §13 |
-| Model Serving | None (mock) | **Ollama** / vLLM | Awesome List §3 |
-| Council Architecture | Sequential pipeline | **DTA-coupled agents** (LangGraph) | Emergence Paper |
-| Knowledge Graph | SQLite adjacency | SQLite + **Graphiti** temporal layer | Awesome List §5 |
-| Graph Reasoning | Word-overlap conflicts | **LightRAG** / GraphRAG | Awesome List §5 |
-| Continual Learning | Retrain from scratch | **O-LoRA** (DTA-inspired) | Emergence Paper |
-| Training Framework | Custom train.py | **Unsloth** / Axolotl / TRL | Awesome List §7 |
-| GRPO Training | Not built | **OpenRLHF** / verl | Awesome List §7 |
-| Synthetic Data | Template generator | **Distilabel** | Awesome List §7 |
-| Human Labeling | Not built | **Argilla** / Label Studio | Awesome List §7 |
-| Data Validation | Not built | **Cleanlab** + Great Expectations | Awesome List §9 |
-| LLM Evaluation | Count-based metrics | **DeepEval** + Promptfoo | Awesome List §9 |
-| RAG Evaluation | Not built | **RAGAs** | Awesome List §9 |
-| Safety Scanning | Not built | **Garak** + LLM Guard | Awesome List §10 |
-| Output Validation | Not built | **Guardrails AI** | Awesome List §10 |
-| Interpretability | Not built | **TransformerLens** + Captum | Awesome List §10 |
-| Experiment Tracking | Tensorboard only | **MLflow** / W&B | Awesome List §8 |
-| Drift Detection | Not built | **Evidently** | Awesome List §8 |
-| Data Versioning | Not built | **DVC** | Awesome List §8 |
-| Agent Memory | Custom memory_store | **Letta** / Mem0 | Awesome List §4 |
-| Agent Orchestration | Custom AgentOS | Keep AgentOS + add **LangGraph** | Awesome List §4 |
-| Document Chat UI | Gradio tabs | Study **Kotaemon** architecture | Awesome List §5 |
----
-## The Emergence Transformer's 3 Key Contributions to This System
-### 1. Council as Coupled Oscillators (Neighbor-DTA)
-Instead of a sequential pipeline, council members interact through attention-mediated coupling. The Emergence Transformer proves that neighbor-attention promotes coherence — members converge on genuinely shared insights while preserving dissent where it matters.
-### 2. Continual Learning Without Forgetting (Self-DTA + Hopfield)
-When adding new scientific domains, the model maintains its existing knowledge through self-attention on past states. The paper provides the theoretical proof that DTA-modified Hopfield networks achieve continual memory storage — directly applicable to our O-LoRA domain adaptation strategy.
-### 3. Tunable Consensus vs Diversity (α Parameter)
-The system can be configured to either push toward agreement (for clear-cut cases) or deliberately preserve plurality (for genuinely ambiguous epistemic classifications). The paper proves that the optimal α depends on network structure — for our 4-member council, this is a tunable hyperparameter.
----
-*Every requirement in this document traces to a specific tool in the Awesome Open Source AI list or a specific result in the Emergence Transformer paper. No requirements were invented outside these two sources.*


1	+ replace_with_file:/app/REQUIREMENTS_FROM_SOURCES.md