Add REQUIREMENTS_FROM_SOURCES.md — requirements grounded in Emergence Transformer paper + Awesome Open Source AI list, highschool-readable

Browse files

Files changed (1) hide show

REQUIREMENTS_FROM_SOURCES.md +407 -0

REQUIREMENTS_FROM_SOURCES.md ADDED Viewed

	@@ -0,0 +1,407 @@

+# PhD Research OS — Requirements Derived from Sources
+## Grounded Entirely in the Emergence Transformer Paper + Awesome Open Source AI List
+**Written so a high school student can understand every word.**
+**Sources**:
+1. 📄 [Emergence Transformer: Dynamical Temporal Attention Matters](https://arxiv.org/abs/2604.19816) — A paper that redesigns the Transformer's attention mechanism so components interact with their own past states through time-varying queries, keys, and values. Shows that neighbor-attention promotes coherence while self-attention has an optimal sweet spot, and applies this to social opinion models and Hopfield networks for continual learning without forgetting.
+2. 📦 [Awesome Open Source AI](https://github.com/alvinreal/awesome-opensource-ai) — A curated list of 500+ battle-tested, production-proven open-source AI tools across 14 categories (April 2026).
+---
+## What These Sources Tell Us
+### From the Emergence Transformer Paper
+The paper introduces **Dynamical Temporal Attention (DTA)** — a version of the Transformer where the query, key, and value matrices change over time. The key insights that apply to our Research OS:
+1. **Neighbor-DTA vs Self-DTA**: When components pay attention to their neighbors' history, coherence (agreement) always increases. When they pay attention to their OWN history, there's an optimal attention weight — too much self-attention actually hurts. This directly maps to our AI Council: council members should attend to EACH OTHER'S reasoning (neighbor-DTA), not just their own previous outputs (self-DTA).
+2. **Emergent Continual Learning**: The paper shows DTA applied to Hopfield neural networks achieves continual learning WITHOUT catastrophic forgetting. This is exactly what our Research OS needs — the model should learn from new papers without forgetting what it learned from old ones.
+3. **Social Coherence Modulation**: DTA can either enhance agreement or preserve plurality in social opinion models. For our system, this means the AI Council should be designable — we can tune it to either push toward consensus (for clear-cut cases) or deliberately preserve disagreement (for genuinely ambiguous cases).
+4. **Time-Varying Attention Kernels**: Standard Transformers have fixed attention patterns. DTA makes attention evolve over time. For our system, this means: as the model processes more of a paper, its attention to earlier sections should change. Reading the Discussion should update how the model interprets the Abstract.
+### From the Awesome Open Source AI List
+The list catalogs the production-ready tools that exist today. Here are the specific tools relevant to each part of our system, organized by what they replace or enable:
+---
+## Requirements by System Layer
+### Layer 0: PDF Parsing — Replace Basic Scrapers with ML Parsers
+**Current state**: PyMuPDF/pdfplumber (basic text extraction)
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[Marker](https://github.com/datalab-to/marker)** | Fast, accurate PDF-to-markdown with table extraction and equation handling | Replaces pdfplumber — preserves document structure, tables, equations |
+| **[MinerU](https://github.com/opendatalab/MinerU)** | High-accuracy PDF parsing with VLM+OCR dual engine | Handles scanned papers and complex layouts that Marker misses |
+| **[Docling](https://github.com/docling-project/docling)** | Document processing toolkit for GenAI workflows | Backup parser for non-standard formats (Word, PPT, Excel supplements) |
+| **[Unstructured](https://github.com/Unstructured-IO/unstructured)** | Best-in-class document preprocessing | Universal fallback for any document type |
+| **[MarkItDown](https://github.com/microsoft/markitdown)** | Microsoft's file-to-Markdown converter | Handles supplementary files (Excel data, PowerPoint presentations) |
+| **[OmniParse](https://github.com/adithya-s-k/omniparse)** | Parses documents, tables, images, videos, audio, web pages | Multi-modal supplement handling (video supplements, audio recordings) |
+**Requirement P-REQ-1**: Integrate Marker as the primary parser. Fall back to MinerU for scanned/OCR documents. Use Docling/MarkItDown for non-PDF supplements.
+**Requirement P-REQ-2**: Use Chonkie ([chonkie-inc/chonkie](https://github.com/chonkie-inc/chonkie)) for intelligent document chunking. It supports semantic, token, and recursive chunking strategies — replacing the current simple section-merge chunking in parser.py.
+---
+### Layer 1: Entity Resolution — Add Embedding-Based Matching
+**Current state**: No embedding model, no entity normalization
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[BGE / FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)** | Best-in-class text embeddings | Convert claims to vectors for semantic matching instead of word overlap |
+| **[FastEmbed](https://github.com/qdrant/fastembed)** | Lightweight embedding with ONNX Runtime, no GPU needed | Local-first embedding that runs on CPU for privacy |
+| **[sqlite-vec](https://github.com/asg017/sqlite-vec)** | Vector search as a SQLite extension | Adds vector similarity to our existing SQLite database — zero new infrastructure |
+| **[MTEB](https://github.com/embeddings-benchmark/mteb)** | Embedding benchmark | Choose the best embedding model for scientific text by testing on MTEB |
+**Requirement M-REQ-1**: Replace Jaccard word overlap in `canonicalizer.py` with embedding-based cosine similarity using FastEmbed + sqlite-vec. This keeps the system local-first (SQLite + CPU embeddings) while enabling semantic deduplication.
+**Requirement M-REQ-2**: Use MTEB to benchmark which embedding model performs best on scientific claim similarity before committing to one.
+---
+### Layer 2: Extraction — Better Models and Constrained Output
+**Current state**: Qwen2.5-3B with mock fallback, no output guarantees
+**Required models from the awesome list**:
+| Model | Why It's Better |
+|-------|----------------|
+| **[Qwen3.6-Plus](https://github.com/QwenLM/Qwen)** | April 2026 flagship, 1M context window, competitive with Claude 4.5 Opus |
+| **[Kimi K2.5](https://github.com/MoonshotAI/Kimi-K2.5)** | 256K context, strong reasoning, native tool-use for agentic workflows |
+| **[Phi-4](https://github.com/microsoft/PhiCookBook)** | Small but highly capable for reasoning and edge/on-device inference |
+| **[OLMo 2](https://github.com/allenai/OLMo)** | Fully open-source (data + code + logs) — by scientists, for scientists |
+| **[GLM-5](https://github.com/zai-org/GLM-5)** | Strong coding, reasoning, and agentic-task performance |
+**Requirement B-REQ-1**: Upgrade the primary brain to Qwen3.6-Plus (or its quantized variant) for maximum reasoning quality. Use Phi-4 as the local/edge fallback for 16GB VRAM deployment.
+**Requirement B-REQ-2**: Use the **Instructor** library ([jxnl/instructor](https://github.com/jxnl/instructor)) for structured output extraction with Pydantic validation. This replaces the need for the Guidance library — Instructor handles validation, retries, and error handling for extracting claims as structured JSON from any LLM.
+**From the Emergence Transformer paper — Requirement B-REQ-3**: Implement **Dynamical Temporal Attention** in the council architecture:
+The paper shows that **neighbor-DTA consistently promotes coherence** while **self-DTA has an optimal weight**. Applied to the AI Council:
+- **Neighbor-DTA for council members**: Each council member (Extractor, Critic, Chairman) should attend to OTHER members' reasoning history, not just their own. This promotes convergence on genuinely shared insights.
+- **Self-DTA with tunable weight (α)**: Each member also attends to their OWN past outputs, but with a tunable weight. The paper proves there's an optimal α — too much self-attention causes overconfidence. Too little means no memory.
+- **Practical implementation**: Store each council member's outputs across multiple papers. When processing paper N, the Extractor can attend to its OWN extractions from papers 1…N-1 (self-DTA) and to the Critic's feedback from papers 1…N-1 (neighbor-DTA). The attention weights are tunable per task.
+This turns the council from a stateless sequential pipeline into a **stateful attention-based ensemble** where members learn from each other's history.
+---
+### Layer 3: Deduplication — Semantic Matching at Scale
+**Current state**: Jaccard word overlap
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[sqlite-vec](https://github.com/asg017/sqlite-vec)** | Vector search in SQLite | Find duplicate claims by meaning, not word overlap, inside our existing DB |
+| **[Chroma](https://github.com/chroma-core/chroma)** | Embedding database | If claim count exceeds SQLite performance, scale to dedicated vector DB |
+| **[rerankers](https://github.com/AnswerDotAI/rerankers)** | Unified reranking API | After finding candidate duplicates by embedding, use cross-encoder reranking for precision |
+**Requirement M-REQ-3**: Implement two-stage deduplication: (1) Fast approximate matching via sqlite-vec embeddings (recall-optimized), (2) Precise reranking via cross-encoder for candidate pairs (precision-optimized).
+---
+### Layer 4: Knowledge Graph — Add Temporal Reasoning and Graph RAG
+**Current state**: SQLite adjacency list, word-overlap conflict detection
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[Graphiti](https://github.com/getzep/graphiti)** | Real-time temporal knowledge graphs with provenance tracking | Tracks how facts change over time — exactly what we need for claim versioning |
+| **[GraphRAG](https://github.com/microsoft/graphrag)** | Knowledge-graph-based retrieval | Enables multi-hop reasoning over the claim graph |
+| **[LightRAG](https://github.com/HKUDS/LightRAG)** | Graph-based RAG with dual-level retrieval | Simpler alternative to GraphRAG for our scale |
+| **[KAG (OpenSPG)](https://github.com/OpenSPG/KAG)** | Knowledge Augmented Generation for logical reasoning | Schema-constrained knowledge construction for professional domains |
+**From the Emergence Transformer paper — Requirement G-REQ-1**: Apply DTA to the knowledge graph for **emergent conflict detection**:
+The paper models N coupled oscillators where coherence emerges from attention-mediated interactions. Claims in the knowledge graph are analogous to oscillators:
+- Each claim has a "phase" (its epistemic state: Fact/Interpretation/Hypothesis and confidence)
+- Claims interact through graph edges (supports/refutes/extends)
+- **Neighbor-DTA** on the graph: When scoring a claim, attend to the HISTORY of its graph neighbors. A claim that was "Interpretation" but whose supporting claims have all been upgraded to "Fact" over time should be reconsidered.
+- **Conflict detection as coherence breakdown**: The paper's order parameter (r_t) measures global coherence. In our graph, sudden drops in local coherence (a cluster of claims that were previously consistent suddenly becoming contradictory because of a new paper) are analogous to desynchronization events. These should trigger alerts.
+**Requirement G-REQ-2**: Use Graphiti for temporal provenance. Every claim stores when it was first extracted, when it was last confirmed by a new source, and when it was contradicted. The graph should answer queries like "What changed about LOD claims for GFET sensors between 2022 and 2025?"
+---
+### Layer 5: Scoring — Add Calibration Infrastructure
+**Current state**: Fixed-point formula works, calibration only planned
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[DeepEval](https://github.com/confident-ai/deepeval)** | LLM evaluation with hallucination detection, bias detection | Automated checking of extraction quality and confidence calibration |
+| **[RAGAs](https://github.com/explodinggradients/ragas)** | RAG evaluation (faithfulness, relevance, context recall) | Evaluate whether extracted claims are faithful to source text |
+**Requirement S-REQ-1**: Use DeepEval's faithfulness metric to automatically check: "Does this extracted claim actually appear in the source text?" This replaces manual gold-standard checking for high-volume papers.
+**Requirement S-REQ-2**: Use RAGAs for end-to-end pipeline evaluation — measure whether the system retrieves the right evidence and generates faithful extractions.
+---
+### Layer 6: Evaluation — Build a Real Test Suite
+**Current state**: Counts distributions, no ground-truth comparison
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)** | De-facto standard for model evaluation | Standardized evaluation across model versions |
+| **[DeepEval](https://github.com/confident-ai/deepeval)** | "Pytest for LLMs" | Unit-test each extraction task with pass/fail criteria |
+| **[Promptfoo](https://github.com/promptfoo/promptfoo)** | LLM testing and red-teaming | Systematic prompt testing, side-by-side model comparison |
+| **[Inspect AI](https://github.com/UKGovernmentBEIS/inspect_ai)** | UK AI Safety Institute's evaluation framework | Multi-turn dialog evaluation with tool use |
+| **[Lighteval](https://github.com/huggingface/lighteval)** | Lightweight model evaluation | Quick evaluation during training |
+**Requirement T-REQ-1**: Implement DeepEval-based unit tests for each extraction task. Each test has:
+- Input: paper excerpt
+- Expected output: correct claims with correct tags
+- Pass criteria: extraction recall ≥ 70%, epistemic accuracy ≥ 60%, qualifier preservation ≥ 80%
+**Requirement T-REQ-2**: Use Promptfoo for prompt regression testing. Every time a system prompt changes, automatically compare outputs before and after.
+---
+### Training Pipeline — Use Production Frameworks
+**Current state**: Custom train.py with TRL SFTTrainer, ZeroGPU micro-batching
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[TRL](https://github.com/huggingface/trl)** | Official SFT, DPO, GRPO, PPO | Already used for SFT — extend to DPO and GRPO stages |
+| **[Axolotl](https://github.com/axolotl-ai-cloud/axolotl)** | YAML-driven SFT, DPO, GRPO pipeline | Simpler configuration than custom scripts |
+| **[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)** | One-stop SFT, DPO, ORPO with web UI | GUI for non-programmers to run training |
+| **[Unsloth](https://github.com/unslothai/unsloth)** | 2× faster, 70% less memory fine-tuning | Makes training feasible on consumer GPUs |
+| **[OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)** | Scalable RLHF with PPO, GRPO, REINFORCE++ | For the GRPO stage with custom epistemic reward functions |
+| **[verl](https://github.com/volcengine/verl)** | ByteDance's RL for LLMs with PPO, GRPO | Alternative GRPO implementation |
+| **[PEFT](https://github.com/huggingface/peft)** | Parameter-efficient fine-tuning (LoRA, etc.) | Already used — continue with LoRA |
+**Requirement TR-REQ-1**: Replace ZeroGPU micro-batching with a single continuous training job using Unsloth (for 2× speedup on consumer GPU) or Axolotl (for YAML-driven pipeline).
+**Requirement TR-REQ-2**: Implement the 4-stage pipeline:
+1. **SFT** via TRL/Unsloth (already partially built)
+2. **DPO** via TRL DPOTrainer on preference pairs
+3. **GRPO** via OpenRLHF or verl with the 3 custom reward functions (JSON validity, schema compliance, qualifier preservation)
+4. **ConfTuner** via custom training loop with tokenized Brier score loss
+**From the Emergence Transformer paper — Requirement TR-REQ-3**: Implement **DTA-inspired continual learning** for domain adaptation:
+The paper demonstrates that DTA applied to Hopfield networks achieves continual learning without catastrophic forgetting. For our model:
+- When fine-tuning on a new scientific domain (e.g., adding ecology to a biosensors-trained model), use the DTA principle: the model should attend to its OWN past activations (self-DTA) to remember old domains while learning new ones.
+- Practically: this maps to **O-LoRA** (orthogonal LoRA) — training new LoRA adapters in orthogonal subspaces so they don't interfere with existing adapters. The DTA paper provides the theoretical foundation for WHY this works: self-attention on past states preserves memory while neighbor-attention on new data drives learning.
+---
+### Synthetic Data Generation
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[Distilabel](https://github.com/argilla-io/distilabel)** | Synthetic data generation and distillation | Generate 10K+ training examples using teacher models |
+| **[Argilla](https://github.com/argilla-io/argilla)** | Data annotation and human-in-the-loop | Label real paper extractions with human experts |
+**Requirement D-REQ-1**: Use Distilabel to generate the teacher ensemble outputs. Run 3-5 teacher models (Qwen3.6-Plus, Kimi K2.5, GLM-5) on 100 real papers and store ALL outputs with disagreement signals.
+**Requirement D-REQ-2**: Use Argilla for human expert labeling of the gold standard test set (10 papers, every claim manually annotated).
+---
+### Data Quality and Labeling
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[Cleanlab](https://github.com/cleanlab/cleanlab)** | Find and fix label errors in datasets | Detect mislabeled training examples automatically |
+| **[Great Expectations](https://github.com/great-expectations/great_expectations)** | Data validation for pipelines | Validate that every training example has required fields, valid JSON, correct tag |
+| **[Label Studio](https://github.com/HumanSignal/label-studio)** | Multi-type data labeling | Interface for human annotators to label paper excerpts |
+**Requirement D-REQ-3**: Run Cleanlab on the existing 1,900 training examples to detect any mislabeled examples (wrong epistemic tags, missing qualifiers).
+**Requirement D-REQ-4**: Use Great Expectations to validate every training example before it enters the training pipeline: valid JSON, tag in allowed set, confidence in [0,1], non-empty source quote.
+---
+### Inference Serving — Replace Mock with Real AI
+**Current state**: No model serving, everything runs through optional API calls
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[Ollama](https://github.com/ollama/ollama)** | Simplest local LLM serving | One-command model serving on consumer hardware |
+| **[vLLM](https://github.com/vllm-project/vllm)** | High-throughput LLM serving | Fast batch processing of many paper sections |
+| **[llama.cpp](https://github.com/ggml-org/llama.cpp)** | CPU/GPU inference for quantized models | Run on laptops without dedicated GPU |
+| **[SGLang](https://github.com/sgl-project/sglang)** | Fast structured generation | Guaranteed valid JSON output via grammar-constrained decoding |
+**Requirement I-REQ-1**: Integrate Ollama as the default local model server. One command to start: `ollama pull qwen3.6-plus:q4` → model available at `http://localhost:11434`.
+**Requirement I-REQ-2**: Use SGLang for constrained decoding — it guarantees valid JSON output with valid enum values. This eliminates broken JSON, invalid tags, and mixed text/JSON output.
+---
+### AI Safety and Security
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[Guardrails AI](https://github.com/guardrails-ai/guardrails)** | Input/output validation for LLMs | Validate extraction outputs match expected schema |
+| **[LLM Guard](https://github.com/protectai/llm-guard)** | Security toolkit for LLM interactions | Detect prompt injection if system is exposed as API |
+| **[Garak](https://github.com/NVIDIA/garak)** | LLM vulnerability scanner | Test model for hallucination patterns specific to scientific claims |
+| **[DeepTeam](https://github.com/confident-ai/deepteam)** | Red teaming framework | Adversarial testing of extraction robustness |
+**Requirement SEC-REQ-1**: Use Guardrails AI to validate every LLM output before it enters the database. Schema validation, tag validation, confidence range checking.
+**Requirement SEC-REQ-2**: Use Garak to scan the fine-tuned model for scientific hallucination patterns. Test: does the model invent statistics? Does it fabricate citations? Does it claim certainty where the paper was uncertain?
+---
+### Interpretability
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[TransformerLens](https://github.com/TransformerLensOrg/TransformerLens)** | Mechanistic interpretability | Understand WHICH attention heads are responsible for qualifier detection vs claim extraction |
+| **[Captum](https://github.com/pytorch/captum)** | PyTorch interpretability | Attribution analysis — which input tokens influenced each output |
+**From the Emergence Transformer paper — Requirement INT-REQ-1**: Use the paper's DTA attention kernel analysis to interpret the fine-tuned model:
+The paper derives explicit formulas for how attention weights evolve over time (Equations 9-10). After fine-tuning, we can:
+- Visualize which tokens in the input (qualifier words like "may," "suggests") have the highest attention weight when the model outputs epistemic tags
+- Track whether attention to the Abstract section decreases when the model has already processed the Results section (temporal attention shift)
+- Identify attention heads that specialize in specific tasks (one head for qualifier detection, another for statistical parsing) — this validates whether the model has learned task-specific representations, answering the "specialist heads" question empirically
+---
+### MLOps and Monitoring
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[MLflow](https://github.com/mlflow/mlflow)** | Experiment tracking, model registry | Track every training run, compare model versions |
+| **[Weights & Biases (wandb)](https://github.com/wandb/wandb)** | Experiment tracking with visualization | Dashboard for training metrics across all 4 stages |
+| **[DVC](https://github.com/iterative/dvc)** | Data and model versioning | Version the training dataset and gold standard |
+| **[Evidently](https://github.com/evidentlyai/evidently)** | ML monitoring and observability | Detect model drift in production |
+| **[Phoenix](https://github.com/Arize-ai/phoenix)** | AI observability | Monitor extraction quality in real-time |
+**Requirement OPS-REQ-1**: Use MLflow or W&B to track all training experiments. Every training run logs: loss curves, evaluation metrics, model checkpoints, hyperparameters, dataset version.
+**Requirement OPS-REQ-2**: Use Evidently for drift detection. Weekly check: run the model on the gold standard test set. If any metric drops >5%, alert.
+---
+### Agent Framework — Connect Real Brains to Agent Bodies
+**Current state**: Full agent lifecycle works, but agents have no AI model connected
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[smolagents](https://github.com/huggingface/smolagents)** | Lightweight agent framework from HuggingFace | Simpler than current custom AgentOS for basic tasks |
+| **[LangGraph](https://github.com/langchain-ai/langgraph)** | Stateful, multi-actor agent orchestration | For the multi-agent council with memory |
+| **[CrewAI](https://github.com/crewAIInc/crewAI)** | Multi-agent collaboration framework | Define roles (Extractor, Critic, Chairman) with collaboration protocols |
+| **[Letta (MemGPT)](https://github.com/letta-ai/letta)** | Stateful agents with persistent memory | Agents that remember across sessions |
+| **[Mem0](https://github.com/mem0ai/mem0)** | Universal memory layer for agents | Persistent memory for the MetaImprover and CitationChaser agents |
+**From the Emergence Transformer paper — Requirement A-REQ-1**: Implement the AI Council as a **DTA-coupled multi-agent system**:
+The paper's model has N oscillators coupled through an adjacency matrix A_ij. The AI Council has N=4 members coupled through information sharing. The DTA framework tells us:
+- **Coupling topology matters**: The paper shows that different network structures (fully connected, small-world, scale-free) produce different coherence patterns. For 4 council members, fully connected (everyone sees everyone) promotes maximum consensus. Star topology (Chairman sees all, others only see Chairman) preserves more diversity.
+- **α parameter tunes consensus vs diversity**: At α=0, no temporal attention → members are stateless → pure diversity. At α=1, full temporal attention → members converge → pure consensus. The paper proves there's an optimal α between 0 and 1 for maximum USEFUL coherence. Tune this per task: high α for clear-cut Fact/Interpretation decisions, low α for ambiguous Conflict_Hypothesis cases.
+- **β parameter controls memory decay**: In the paper, β determines how fast old attention information decays. For the council, β controls how much members remember from previous papers. High β = short memory (each paper is fresh). Low β = long memory (patterns from 100 papers ago still influence decisions).
+---
+### UI and User Experience
+**Required tools from the awesome list**:
+| Tool | What It Does | Why We Need It |
+|------|-------------|----------------|
+| **[Gradio](https://github.com/gradio-app/gradio)** | Already used | Continue using for main UI |
+| **[Kotaemon](https://github.com/Cinnamon/kotaemon)** | RAG-based document chat with Gradio UI | Reference implementation for document Q&A interface |
+| **[Open Notebook](https://github.com/lfnovo/open-notebook)** | AI-powered notebook with multi-modal support | Model for how to build the Obsidian-like research interface |
+**Requirement UI-REQ-1**: Study Kotaemon's architecture for the "chat with your papers" interface. It has hybrid RAG, re-ranking, and multi-modal support — exactly what the Research OS courtroom UI needs.
+---
+## Summary: The Complete Requirements Stack
+| Layer | Current Tool | Required Tool(s) | Source |
+|-------|-------------|------------------|--------|
+| PDF Parsing | pdfplumber/PyMuPDF | **Marker** + MinerU + Docling | Awesome List §5 |
+| Chunking | Custom section merger | **Chonkie** | Awesome List §5 |
+| Embeddings | None | **FastEmbed** + sqlite-vec | Awesome List §5 |
+| Deduplication | Jaccard overlap | FastEmbed + **rerankers** | Awesome List §5 |
+| Base Model | Qwen2.5-3B | **Qwen3.6-Plus** / Phi-4 | Awesome List §2 |
+| Structured Output | Hope-for-JSON | **Instructor** / SGLang | Awesome List §13 |
+| Model Serving | None (mock) | **Ollama** / vLLM | Awesome List §3 |
+| Council Architecture | Sequential pipeline | **DTA-coupled agents** (LangGraph) | Emergence Paper |
+| Knowledge Graph | SQLite adjacency | SQLite + **Graphiti** temporal layer | Awesome List §5 |
+| Graph Reasoning | Word-overlap conflicts | **LightRAG** / GraphRAG | Awesome List §5 |
+| Continual Learning | Retrain from scratch | **O-LoRA** (DTA-inspired) | Emergence Paper |
+| Training Framework | Custom train.py | **Unsloth** / Axolotl / TRL | Awesome List §7 |
+| GRPO Training | Not built | **OpenRLHF** / verl | Awesome List §7 |
+| Synthetic Data | Template generator | **Distilabel** | Awesome List §7 |
+| Human Labeling | Not built | **Argilla** / Label Studio | Awesome List §7 |
+| Data Validation | Not built | **Cleanlab** + Great Expectations | Awesome List §9 |
+| LLM Evaluation | Count-based metrics | **DeepEval** + Promptfoo | Awesome List §9 |
+| RAG Evaluation | Not built | **RAGAs** | Awesome List §9 |
+| Safety Scanning | Not built | **Garak** + LLM Guard | Awesome List §10 |
+| Output Validation | Not built | **Guardrails AI** | Awesome List §10 |
+| Interpretability | Not built | **TransformerLens** + Captum | Awesome List §10 |
+| Experiment Tracking | Tensorboard only | **MLflow** / W&B | Awesome List §8 |
+| Drift Detection | Not built | **Evidently** | Awesome List §8 |
+| Data Versioning | Not built | **DVC** | Awesome List §8 |
+| Agent Memory | Custom memory_store | **Letta** / Mem0 | Awesome List §4 |
+| Agent Orchestration | Custom AgentOS | Keep AgentOS + add **LangGraph** | Awesome List §4 |
+| Document Chat UI | Gradio tabs | Study **Kotaemon** architecture | Awesome List §5 |
+---
+## The Emergence Transformer's 3 Key Contributions to This System
+### 1. Council as Coupled Oscillators (Neighbor-DTA)
+Instead of a sequential pipeline, council members interact through attention-mediated coupling. The Emergence Transformer proves that neighbor-attention promotes coherence — members converge on genuinely shared insights while preserving dissent where it matters.
+### 2. Continual Learning Without Forgetting (Self-DTA + Hopfield)
+When adding new scientific domains, the model maintains its existing knowledge through self-attention on past states. The paper provides the theoretical proof that DTA-modified Hopfield networks achieve continual memory storage — directly applicable to our O-LoRA domain adaptation strategy.
+### 3. Tunable Consensus vs Diversity (α Parameter)
+The system can be configured to either push toward agreement (for clear-cut cases) or deliberately preserve plurality (for genuinely ambiguous epistemic classifications). The paper proves that the optimal α depends on network structure — for our 4-member council, this is a tunable hyperparameter.
+---
+*Every requirement in this document traces to a specific tool in the Awesome Open Source AI list or a specific result in the Emergence Transformer paper. No requirements were invented outside these two sources.*