# Codette v2.0 — Multi-Perspective AI Reasoning System ## Overview Codette v2.0 is a production-ready multi-agent reasoning system that combines analytical depth with controlled debate. It routes queries to specialized reasoning adapters, orchestrates multi-perspective discussion, detects and manages epistemic tension, and synthesizes nuanced conclusions. **Version**: 2.0 (Phase 6 + Stability Patches) **Model**: Llama 3.1 8B quantized with LoRA adapters **Memory**: Cocoon-backed persistent session state (encrypted) **Deployment**: Zero-dependency local web server (Python stdlib) --- ## Core Capabilities ### 1. Domain-Aware Agent Routing (Phase 6, Patch 5) - **Automatic domain detection** from query keywords - **Selective agent activation** — only relevant perspectives participate - **Domain-to-agent mapping**: - **Physics** → Newton, Quantum - **Ethics** → Philosophy, Empathy - **Consciousness** → Philosophy, Quantum - **Creativity** → DaVinci, Quantum - **Systems** → Quantum, Philosophy **Why it matters**: Reduces noise, improves reasoning quality, prevents irrelevant agents from cluttering debate. ### 2. Semantic Conflict Detection & Analysis (Phase 6) - **Embedding-based tension scoring** (1.0 - cosine_similarity of Llama embeddings) - **Hybrid opposition scoring** = 60% semantic + 40% heuristic pattern matching - **Conflict types classified**: - **Contradiction** (direct negation) - **Emphasis** (different framing, same core) - **Framework** (operating from different models) - **Depth** (shallow vs. detailed treatment) **Key metric**: ξ (Xi) — Epistemic Tension (0-1, continuous, not discrete) **Why it matters**: Real semantic disagreement vs. surface-level differences — enables productive debate. ### 3. Controlled Multi-Round Debate (Phase 6, Patch 2, Patch 4) - **Round 0**: All agents analyze query independently - **Rounds 1-3**: Debate between selected pairs, seeing peer responses - **Conflict capping** (Patch 2): Hard limit of top 10 conflicts per round - Prevents combinatorial explosion (214-860 conflicts → capped at 10) - **Gamma authority** (Patch 4): Hard stop if system coherence drops below 0.3 - Allows healthy debate while preventing runaway - Previously: 0.5 threshold was too aggressive - Now: 0.3 threshold balances stability with reasoning depth **Why it matters**: Debate amplifies reasoning quality without spiraling into infinite disagreement. ### 4. Real-Time Coherence Monitoring (Phase 5A) - **Γ (Gamma) metric** = system health score (0-1) - 0.3-0.7: Healthy debate (tension + diversity) - >0.8: Groupthink (approaching false consensus) - <0.3: Collapse (emergency stop triggered) - **Components measured**: - Average conflict strength - Perspective diversity - Adapter weight variance - Resolution rate (conflict closure over rounds) **Why it matters**: Detects emergent pathologies before they corrupt reasoning. ### 5. Multi-Phase Conflict Evolution Tracking (Phase 3) - Tracks conflicts across debate rounds - Measures resolution effectiveness - **Resolution types**: - Hard victory (one perspective wins) - Soft consensus (integrated understanding) - Stalled (unresolved) - Worsened (debate amplified conflict) - **Metrics**: trajectory slope, resolution rate, time-to-resolution **Why it matters**: Understands whether debate actually improves reasoning or creates noise. ### 6. Experience-Weighted Adapter Selection (Phase 2, Phase 4) - **Memory-based learning**: Tracks adapter performance historically - **Dynamic weight adjustment** (0-2.0 scale): - High-performing adapters get boosted - Low-performers get suppressed - Soft boost: modulates router confidence ±50% - **Learning signals**: - Resolution rate > 40% → boost +0.08 - Soft consensus → boost +0.03 - Conflicts worsened → penalize -0.08 - **Recency decay**: 7-day half-life (recent performance weighted higher) **Why it matters**: System improves over time; learns which adapters work for which questions. ### 7. Specialization Tracking (Phase 6) - Per-adapter, per-domain performance monitoring - **Specialization score** = domain_accuracy / usage_frequency - **Convergence detection**: Alerts if adapter outputs >0.85 similar - Prevents semantic monoculture (adapters doing same work) **Why it matters**: Ensures adapters maintain functional specialization despite weight drift. ### 8. Ethical Governance & Safety (AEGIS, Nexus) - **AEGIS module**: Evaluates outputs for: - Factual accuracy (known unknowns flagged) - Harmful content detection - Bias detection - Alignment with user intent - **Nexus signal intelligence**: Cross-checks for contradictions between adapters - **Guardian input check**: Sanitizes input before routing **Why it matters**: AI that reasons deeply also reasons responsibly. ### 9. Living Memory with Cocoon Storage (Phase 2) - **Persistent session state** across conversations - **Cocoon storage**: Encrypts, deduplicates, and compresses memories - **Conflict replay**: Top 5 conflicts per debate stored for learning - **Memory footprint**: ~5KB per conflict (highly efficient) **Why it matters**: Conversation context persists; system builds understanding within and across sessions. ### 10. Pre-Flight Conflict Prediction (Phase 6) - **Spiderweb injection** before debate starts - **5D state encoding** of queries: - ψ (psi): concept magnitude - τ (tau): temporal progression - χ (chi): processing velocity - φ (phi): emotional valence - λ (lambda): semantic diversity - **Conflict profiling**: Predicts which adapter pairs will clash and along which dimensions - **Router recommendations**: Pre-select stabilizing adapters **Why it matters**: Reduces wasted debate cycles by predicting conflicts before they happen. --- ## Phase 6 Stability Patches Three critical patches address the "thinking but not stopping" pathology: ### Patch 1: Conflict Filtering (Framework Differences) ``` if conflict_type == "framework" and semantic_overlap > 0.6: discard_conflict() ``` High-overlap framework disagreements aren't worth debating. ### Patch 2: Top-K Conflict Selection (Hard Cap) ``` conflicts = sorted(conflicts, key=lambda x: x.strength)[:10] ``` Prevents combinatorial explosion. Alone fixes ~80% of the explosion problem. ### Patch 3: Gamma Authority with Tuned Threshold ``` if gamma < 0.3: # Changed from 0.5 to allow more debate stop_debate = True ``` Hard stop only when truly collapsing. Allows healthy multi-round debate. **Result**: Conflicts down to 10-30 per round (from 1500+), gamma stable at 0.7-0.9, reasoning depth preserved. --- ## Example Queries & Expected Behavior ### Physics Question **Query**: "What is the speed of light and why does it matter?" - **Domain detected**: physics - **Agents activated**: Newton (analytical), Quantum (relativistic) - **Debate**: Newton discusses classical mechanics; Quantum discusses relativistic invariance - **Coherence**: High (0.75+) — complementary perspectives - **Synthesis**: Unified explanation covering both scales ### Ethics Question **Query**: "How should we balance accuracy and explainability in AI systems?" - **Domain detected**: ethics - **Agents activated**: Philosophy (frameworks), Empathy (stakeholder impact) - **Debate**: Philosophy discusses deontological vs. consequentialist trade-offs; Empathy discusses user understanding needs - **Coherence**: Medium (0.65-0.75) — genuine tension between values - **Synthesis**: Nuanced trade-off analysis acknowledging incommensurable values ### Consciousness Question **Query**: "What would it mean for a machine to genuinely understand?" - **Domain detected**: consciousness - **Agents activated**: Philosophy (conceptual), Quantum (probabilistic modeling) - **Debate**: Philosophy questions definitions of understanding; Quantum discusses computational capacity - **Coherence**: May trend low (0.5-0.65) — hard problem, genuine disagreement - **Synthesis**: Honest assessment of philosophical limits and empirical gaps --- ## Architecture Diagram ``` Query Input ↓ [Domain Detection] → Classify physics/ethics/consciousness/creativity/systems ↓ [Agent Gating] (Patch 5) → Activate 2-3 relevant agents only ↓ Round 0: Independent Analysis ↓ [Conflict Detection] → Semantic tension + heuristic opposition ↓ [Conflict Capping] (Patch 2) → Top 10 by strength ↓ Debate Rounds (1-3): ├─ Agent pairs respond to peer perspectives ├─ [Conflict Evolution Tracking] → measure resolution ├─ [Experience-Weighted Routing] → boost high-performers ├─ [Gamma Monitoring] → coherence health check └─ [Gamma Authority] (Patch 4) → stop if γ < 0.3 ↓ [Synthesis Engine] → Integrate debate + memory ↓ [AEGIS Evaluation] → Safety/alignment check ↓ Response Stream (SSE) ↓ [Cocoon Storage] → Remember conflict + resolution ``` --- ## Performance Characteristics | Metric | Value | Notes | |--------|-------|-------| | Model size | 8.5GB (quantized) | Llama 3.1 8B F16 | | Load time | ~60s | First inference takes longer | | Query latency | 10-30s | Includes 1-3 debate rounds | | Max debate rounds | 3 | Configurable per query | | Conflicts per round | ~10 (capped) | From 200-800 raw | | Memory per session | 1-5MB | Cocoon-compressed | | Adapter count | 8 (expandable) | Newton, DaVinci, Empathy, Philosophy, Quantum, Consciousness, Systems, Multi-Perspective | --- ## Deployment ### Local Web UI ```bash # Double-click to launch codette_web.bat # Or command line python inference/codette_server.py [--port 8080] [--no-browser] ``` **URL**: http://localhost:7860 **Features**: - Streaming responses (SSE) - Session persistence - Export/import conversations - Cocoon dashboard - Spiderweb visualization ### Programmatic API ```python from reasoning_forge.forge_engine import ForgeEngine forge = ForgeEngine(enable_memory_weighting=True) result = forge.forge_with_debate( concept="Is consciousness computational?", debate_rounds=2 ) print(result['synthesis']) print(f"Coherence: {result['metadata']['gamma']}") ``` --- ## Known Limitations & Future Work ### Current Limitations - **Debate can be noisy on hard problems**: Consciousness, abstract philosophy still generate high tension (expected) - **Pre-flight predictor not yet suppressing agents**: Predicts conflicts but doesn't yet prevent them (Phase 7) - **No knowledge cutoff management**: Doesn't distinguish between known unknowns and hallucinations ### Phase 7 (Research Direction) - Semantic drift prevention (adapter convergence < 0.70) - Client-side preference learning (user ratings → memory boost) - Multi-turn question refinement - Confidence calibration (reported ≠ actual correctness) - Cross-domain synthesis (combining insights from different domains) --- ## Citation & Attribution **Creator**: Jonathan Harrison **Framework**: RC+ξ (Reasoning & Conflict + Epistemic Tension) **Version**: Codette v2.0, Session 2026-03-19 **Components**: 6 years of multi-agent reasoning research, formalized in 2026 --- ## Getting Started 1. **Launch the UI**: ```bash double-click codette_web.bat ``` 2. **Ask a Question**: - Type in the chat box or select a suggested question - Codette automatically routes to relevant adapters - Watch the Cocoon dashboard for real-time metrics 3. **Save & Resume**: - Conversations auto-save with Cocoon storage - Sessions persist across browser closures - Export for sharing or analysis 4. **Dive Deeper**: - Read `PHASE6_CONTROL_PATHOLOGY.md` for system design insights - Check `evaluation_results.json` for empirical validation data - Explore memory with the "Cocoon" panel --- **Welcome to Codette v2.0. What would you like to think through today?**