morpheus-rag / ARCHITECTURE.md
nothex
fix: avoid hugging face build-time model download
b2e76a8

Morpheus β€” Architecture Guide

Ask anything. Your documents answer.

This file is the source of truth for how Morpheus works today, what is already live in the codebase, and what the project is prioritizing next.


What Morpheus Is Today

Morpheus is a multi-tenant RAG platform for user-uploaded PDFs.

Users upload PDF documents. They ask questions in natural language. Morpheus retrieves evidence and streams grounded answers with citations and diagnostics.

What is already live:

  • Each user sees only their own documents, enforced through tenant-scoped writes and RLS-backed reads
  • Retrieval combines BM25 keyword search + pgvector semantic search + reranking
  • Routing is multi-path, not single-path: exact/page-scoped lookup, structural tree search, dense retrieval, graph-assisted retrieval, and memory-aware follow-ups
  • Repeated questions can short-circuit through a semantic cache
  • Conversations are remembered across sessions via episodic memory
  • Query traces capture route selection, expert weights, retrieval diagnostics, and answer quality metadata
  • If one AI provider fails, the system tries the next provider/model in the chain

Important framing:

  • Morpheus is a RAG engine
  • Morpheus does have a mixture-of-experts-style orchestration layer at retrieval time
  • Morpheus is not a trained neural MoE model
  • Morpheus is not yet a true agentic retrieval system with an iterative planner/executor loop
  • Morpheus currently supports PDF upload ingestion as the primary production path, plus a new local Python code graph indexing path for graph-first exploration

Current Engineering Priority Order

This is the active operating order for the project. New source kinds should not leapfrog retrieval quality work.

Phase 0 β€” Understand What We Have

  • Implemented: per-query traces with route selection, selected experts, expert weights, rerank audit, diagnostics, and quality metrics
  • Implemented: router mechanism and retrieval branch selection in code
  • Required next operating discipline: maintain a manually scored baseline of 50-100 real queries

Phase 1 β€” Fix Quality on the Existing Path

  • Highest priority: improve retrieval and generation quality on the current PDF path
  • Focus areas: chunking, hybrid weighting, reranker thresholds, grounding instructions, hallucination guardrails

Phase 2 β€” Add Source-Kind Architecture

  • Only after Phase 1 is stable
  • Add source_kind, data_shape, and parser_kind
  • Add code ingestion as a separate structured pipeline
  • Add URL ingestion behind strict security controls

Status note:

  • Initial Phase 2 substrate is now live: source_kind, data_shape, parser_kind, DB-backed graph_runs, graph-first API endpoints, and deterministic Python code graph indexing via local script
  • Remaining Phase 2 work is broader source support (markdown, url, richer code languages) and deeper graph-first answer orchestration

Phase 3 β€” Scale and Cost Optimization

  • Model tiering by route/query complexity
  • Embedding cache layer
  • Reranker gating based on retrieval confidence

Supported Sources Right Now

Source Status Notes
PDF upload Live Only production ingestion path today
URL ingestion Not live Planned for a later phase
Markdown files Not live Planned via source-kind architecture
Python code graph indexing Live (local/scripted) Deterministic AST-first graph indexing for graph-first exploration; not a browser upload path
Code/config/API files (beyond Python graph indexing) Partial Broader structured ingestion still planned as a separate pipeline, not an extension of generic document chunking

Project Structure

morpheus/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py                  FastAPI app, startup wiring, rate limiter
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ auth.py              /api/v1/auth/*
β”‚   β”‚   β”œβ”€β”€ query.py             /api/v1/query β€” SSE streaming query path
β”‚   β”‚   β”œβ”€β”€ corpus.py            /api/v1/corpus/*
β”‚   β”‚   β”œβ”€β”€ ingest.py            /api/v1/ingest/*
β”‚   β”‚   β”œβ”€β”€ graph.py             /api/v1/graph β€” graph search, path, export
β”‚   β”‚   β”œβ”€β”€ frontend_config.py   /api/v1/config
β”‚   β”‚   └── admin.py             traces + feedback admin endpoints
β”‚   └── core/
β”‚       β”œβ”€β”€ pipeline.py          Main orchestration layer
β”‚       β”œβ”€β”€ pipeline_routing.py  Route classes + expert weighting
β”‚       β”œβ”€β”€ pipeline_retrieval.py Retrieval helpers
β”‚       β”œβ”€β”€ pipeline_generation.py Generation helpers
β”‚       β”œβ”€β”€ pipeline_ingestion.py PDF ingestion workflow
β”‚       β”œβ”€β”€ pipeline_pageindex.py Structural tree retrieval
β”‚       β”œβ”€β”€ pipeline_memory.py   Episodic memory helpers
β”‚       β”œβ”€β”€ pipeline_ambiguity.py Scope and ambiguity handling
β”‚       β”œβ”€β”€ pipeline_types.py    Shared pipeline metadata types
β”‚       β”œβ”€β”€ graph_hybrid.py      Source metadata + graph-first search/path/export helpers
β”‚       β”œβ”€β”€ code_graph.py        Deterministic Python AST graph indexing
β”‚       β”œβ”€β”€ providers.py         LLM + embedding provider fallback
β”‚       β”œβ”€β”€ classifier.py        3-stage document classifier
β”‚       β”œβ”€β”€ intent_classifier.py sklearn intent model, online retraining
β”‚       β”œβ”€β”€ cache_manager.py     Semantic Redis cache with version invalidation
β”‚       β”œβ”€β”€ auth_utils.py        JWT helpers + require_auth_token Depends()
β”‚       β”œβ”€β”€ config.py            All constants and tuneable settings
β”‚       β”œβ”€β”€ rate_limit.py        Shared limiter setup
β”‚       └── tasks.py             Celery task for background PDF ingestion
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html
β”‚   └── js/
β”‚       β”œβ”€β”€ config.js            Runtime config
β”‚       β”œβ”€β”€ api.js               All fetch() calls β€” single source of truth
β”‚       β”œβ”€β”€ state.js             Global STATE object
β”‚       β”œβ”€β”€ chat.js              Streaming chat UI
β”‚       β”œβ”€β”€ corpus.js            Upload + document management
β”‚       β”œβ”€β”€ graph.js             D3 force-directed knowledge graph
β”‚       β”œβ”€β”€ inspect.js           Node detail panel
β”‚       β”œβ”€β”€ ui.js                Shared UI helpers
β”‚       └── main.js              Boot sequence + auth gate
β”œβ”€β”€ shared/
β”‚   └── types.py                 Pydantic models shared by API and pipeline
└── supabase/
    β”œβ”€β”€ migrations/
    └── rls/

The Database (Supabase / PostgreSQL)

Tables

documents β€” The vector store. Every chunk and summary node from every PDF lives here.

Column Type Purpose
id uuid Deterministic ID: uuid5(file_hash + chunk_index)
content text Chunk text that gets searched
metadata jsonb source, file_hash, document_type, page numbers, retrieval metadata
embedding vector(2048) nvidia-nemotron embedding for pgvector search
user_id uuid RLS tenant isolation

ingested_files β€” Dedup registry.

Column Type Purpose
file_hash text SHA-256 of the PDF β€” the dedup key
filename text Display name in the UI
document_type text Category e.g. academic_syllabus
source_kind text pdf, code, markdown, url, etc.
data_shape text structured, unstructured, or hybrid
parser_kind text Parser used e.g. pdf_partition, python_ast
chunk_count int Includes RAPTOR tree nodes
user_id uuid Tenant isolation
user_overridden bool True if user manually changed category β€” classifier skips

chat_memory β€” Episodic memory, searchable by semantic similarity.

Column Type Purpose
session_id text Groups messages from the same conversation
role text user or assistant
content text The message text
embedding vector For semantic search via match_memory RPC
user_id uuid Tenant isolation

document_trees β€” Hierarchical tree index for structural queries.

Column Type Purpose
file_hash text Links tree to the source document
tree_json jsonb Recursive node structure: {title, content, children}
user_id uuid Tenant isolation

category_centroids β€” The document classifier's learned memory.

Column Type Purpose
document_type text Category label
centroid_vector array Running average embedding of all docs of this type
document_count int Number of documents that contributed
user_id uuid Per-tenant centroids

evaluation_logs β€” RAGAS quality metrics written after every query.

rerank_feedback β€” Every Cohere rerank decision, stored for future CrossEncoder distillation.

intent_feedback β€” Online training data for the intent classifier.

query_traces β€” Per-query trace record with route mode, selected experts, expert weights, candidate counts, diagnostics, and quality metrics.

graph_nodes / graph_edges β€” Graph foundation built during ingestion and enriched by query/feedback workflows.

graph_runs β€” Auditable record of each graph extraction/indexing pass, including parser/source metadata and content hash.

Category list β€” Derived directly from each tenant's ingested_files.document_type values.

Supabase RPC Functions

Function Purpose
hybrid_search(query_text, query_embedding, match_count, filter, semantic_weight, keyword_weight, p_user_id) Combined BM25 + pgvector search (tenant-scoped overload)
match_memory(query_embedding, match_session_id, match_count) Semantic search over chat history
insert_document_chunk(p_id, p_content, p_metadata, p_embedding, p_user_id) Secure insert with explicit user_id
get_document_types() Returns distinct categories for this tenant

Row Level Security

Every table has RLS policies. Core rule: user_id = auth.uid() for reads.

Writes from Celery workers use the service-role key (Celery has no browser session so auth.uid() is NULL), but always inject user_id explicitly via the insert_document_chunk RPC β€” extracted from the JWT at the API boundary before the task is queued.


The Ingestion Pipeline

Browser
  POST /api/v1/ingest/upload (X-Auth-Token header)
  FastAPI: JWT validated, upload limits checked, MIME type checked
  Per-user document count checked (50 max)
  PDF saved to temp file
  process_pdf_task.delay() β†’ Redis queue
  Returns {task_id} immediately
  Browser polls /api/v1/ingest/status/{task_id} every 2 seconds

Celery worker (background):

Step 1: Dedup check
  SHA-256 fingerprint of PDF
  Check ingested_files table (O(1) indexed lookup)
  Already ingested β†’ return "already_ingested"
  user_overridden=True β†’ skip classifier, use forced_category

Step 2: PDF partitioning (unstructured library)
  partition_pdf() β€” OCR + layout detection
  extract_images_from_pdf() β€” PyMuPDF, filters tiny/skewed images
  Returns Element objects (Title, NarrativeText, Table, Image...)

Step 3: Classification (classifier.py)
  Three-stage cascade:
    Stage 1: Centroid nearest-neighbour (no API call, cosine similarity)
             Confidence >= 0.72 β†’ done
    Stage 2: Ensemble vote (centroid + label-embed + TF-IDF)
             Score >= 0.38 β†’ done
    Stage 3: LLM chain-of-thought (novel document types only)
  Sparse/tabular pre-check: routes to visual classification if word count < 200
  After classification: centroid updated with this document's vector

Step 4: Chunking + AI summaries
  chunk_by_title() groups elements into logical sections
  Chunks with tables or images: parallel AI vision summarisation (5 workers)
  Each chunk becomes a LangChain Document with rich metadata

Step 5: RAPTOR tree indexing
  Groups leaf chunks into clusters of 5
  LLM generates parent summary for each cluster
  Repeats up the tree until single root node
  Root node answers "what is this document about?"
  Leaf nodes answer specific detail questions
  All nodes (leaves + summaries) uploaded to documents table

Step 6: Embedding + upload
  Batch embed all nodes via nvidia-nemotron (2048 dims)
  Insert each via insert_document_chunk RPC (explicit user_id)
  Register in ingested_files
  Invalidate semantic cache for this user (kb_version++)

Step 7: Graph foundation persistence
  Document/entity/topic graph rows written to graph_nodes / graph_edges

Current ingestion boundary

  • Live source model: PDF only
  • Live semantic category model: document_type
  • Not yet live: source_kind, data_shape, parser_kind
  • Design rule: future code/config/API ingestion should not be bolted onto this PDF chunking path

The Retrieval Pipeline

Browser
  POST /api/v1/query {query, category, history, session_id, alpha}
  X-Auth-Token header

FastAPI validates JWT, starts SSE streaming response

Step 1: Intent analysis (analyse_intent)
  Local sklearn classifier β€” under 5ms, no API call
  Inputs: query text, has_category, has_history
  Output: {is_clear, enriched_query, clarification_question}
  If needs clarification β†’ stream question back, stop
  Clarification limit: after 2 consecutive turns, proceed regardless
  Reference queries ("summarise it"): replaced with previous query
  Every query logged to intent_feedback for online retraining

Step 1.5: Ambiguity / scope safety (check_query_ambiguity)
  If the user has NOT pinned a document:
  - If **multiple docs are in scope** and the query is **identity/page-scoped** (owner/title/publisher/cover/first page), Morpheus **asks the user to pick a document** (never guesses).
  - Otherwise, Morpheus may ask a clarification question for generic queries when multiple docs match.
  Implementation detail: ambiguity scoring uses `hybrid_search(..., p_user_id=...)` to avoid PostgREST overload ambiguity.

Step 2: Query routing
  Route class chosen first:
    exact_fact / page_scoped / summary / follow_up / compare / multi_part / relational / factoid / no_retrieval
  Expert weights assigned and persisted:
    dense_chunk / raptor_summary / graph_traversal / episodic_memory / hybrid_compare
  Special deterministic branch:
    identity_store
  Structural queries?
    β†’ tree_search(): recursive traversal of document_trees for this user
    β†’ If tree search returns 0 results: falls back to hybrid search
  Exact/page-scoped questions with curated evidence?
    β†’ identity_store path
  Relational / graph-supported questions?
    β†’ graph retrieval can join the candidate pool
  Everything else β†’ retrieve_chunks() hybrid path

Step 3: retrieve_chunks() β€” hybrid retrieval path
  a) Follow-up detection
     Query ≀8 words with pronouns (it/this/that/they)?
     Reuse _last_chunks[session_key] β€” no re-search
     Safety guard: ordinal follow-ups like "the second one" must have an explicit referent (a list);
     otherwise the API asks for clarification instead of guessing.

  b) Semantic cache check
     Embed query (256-entry in-memory LRU cache)
     Scan Redis for cosine similarity β‰₯ 0.92
     Cache hit β†’ return __CACHE_HIT__ sentinel document

  c) Query rewriting
     LLM breaks query into 1-3 targeted sub-queries
     Short queries (≀3 words) skip this step

  d) Hybrid search (per sub-query)
     hybrid_search RPC: BM25 + pgvector combined
     alpha=0.5 = equal weight (adjustable via UI slider)
     Deduplicates across sub-queries by chunk ID
     Category filter active if user selected one
     Graph pinning can restrict search to pinned files

  e) Reranking (3-tier fallback)
     Tier 1: Cohere rerank-multilingual-v3.0 (cloud, best quality)
     Tier 2: CrossEncoder ms-marco-MiniLM-L-6-v2 (local, free)
     Tier 3: Lexical Jaccard similarity (pure Python, always works)
     Relevance threshold: 0.35 (relaxed to 0.05 for small corpus)
     Diversity filter: max 2 chunks per source, cross-category seeding

  f) Log rerank feedback (fire-and-forget thread)

Step 4: generate_answer_stream()
  __CACHE_HIT__ sentinel β†’ stream cached answer directly, skip LLM
  match_memory RPC β†’ retrieve past relevant Q&A pairs (episodic memory)
  identity_store-only exact/page-scoped route can bypass normal generative answering
  Build prompt: system + retrieved chunks + memories + history + query
  Stream tokens: Groq β†’ Gemini β†’ OpenRouter fallback chain
  After streaming: save Q&A pair to chat_memory (background thread)
  Store answer in semantic cache (versioned key, TTL by document type)

Step 5: Trace + quality persistence
  Persist query_traces row:
    trace_id, route_mode, selected_experts, expert_weights,
    candidate_counts, doc_diagnostics, quality_metrics, latency
  Persist evaluation_logs
  Optionally enrich trace graph links

Step 6: Emit sources
  {type: "done", sources: [...], images: [...]} SSE event

Deterministic vs Probabilistic Paths

This distinction should stay explicit in the architecture and in future roadmap work.

More deterministic today

  • JWT validation and tenant scoping
  • File dedup
  • MIME gating for ingestion
  • Identity-store retrieval for exact/page-scoped questions
  • PageIndex tree traversal
  • Explicit ambiguity gating and document pinning
  • Query trace persistence

More probabilistic today

  • Query rewriting
  • Embedding retrieval
  • BM25/vector score balancing
  • Reranking
  • LLM generation
  • LLM fallback document classification
  • RAPTOR summary generation

Design rule

For high-precision questions, Morpheus should prefer deterministic branches first and only use probabilistic retrieval/generation when necessary.


The Provider System

ProviderFactory.build_chat_llm(purpose=...)

  purpose="text"       Groq β†’ Gemini β†’ OpenRouter
  purpose="ingestion"  Gemini (1M context) β†’ OpenRouter
  purpose="vision"     Gemini (native multimodal) β†’ OpenRouter vision
  purpose="rewriter"   OpenRouter β†’ Groq
  purpose="classifier" OpenRouter classifier models only

Embeddings: nvidia/llama-nemotron-embed-vl-1b-v2:free (2048 dims) β†’ text-embedding-3-small

Current model lists (all configurable in config.py):

Provider Models (in fallback order)
Groq llama-4-scout-17b β†’ llama-3.3-70b-versatile β†’ qwen3-32b β†’ llama-3.1-8b-instant
Gemini gemini-2.5-flash β†’ gemini-2.5-flash-lite
OpenRouter stepfun/step-3.5-flash:free β†’ nvidia/nemotron-3-super-120b:free β†’ arcee-ai/trinity-large-preview:free β†’ meta-llama/llama-3.3-70b-instruct:free β†’ more

Retry logic in each provider wrapper: 404, 429, 503 are retryable β€” the wrapper moves to the next model in the list.


The Semantic Cache

cache_manager.py β€” version-invalidated, similarity-based lookup.

  1. Each user has a kb_version integer in Redis: nexus:kb_version:{user_id}
  2. Cache entries keyed by version: nexus:qcache:{user_id}:v{version}:...
  3. Lookup: scan all entries for this user+version, find best cosine similarity
  4. Hit threshold: 0.92 (strict β€” wrong answers are worse than cache misses)
  5. Corpus change (ingest or delete): increment_kb_version() β†’ version N β†’ N+1
  6. All v1 entries become invisible under v2 β€” no explicit deletion needed

TTL by document type: academic_syllabus and reference_chart cache for 7 days; technical_manual and research_paper for 3 days; financial_report and hr_policy for 1 day; general_document for 1 hour.


Trace-First Quality Workflow

Phase 0 is partly implemented in code and should now be treated as normal workflow, not optional ops work.

Already live

  • trace_id emitted through the query path
  • query_traces persistence
  • selected_experts and expert_weights
  • rerank audit data
  • route class and route reason
  • candidate counts and document diagnostics
  • answer preview and latency logging
  • evaluation_logs writes

What still needs to become routine

  • Manual review set of 50-100 real queries
  • Explicit tagging of recurring failure modes
  • Regular tuning cycles driven by query traces instead of anecdotal examples

This phase remains the entry point for major retrieval changes.


The Intent Classifier

intent_classifier.py β€” sklearn, runs locally, under 5ms per query.

What it classifies: Is this query clear enough to search, or should we ask for clarification?

Features: has_category flag, has_history flag, query text embedded via all-MiniLM-L6-v2.

Online learning: Every query is logged to intent_feedback. Every 25 rows, the model retrains on accumulated examples and saves to intent_model.pkl. Active learning targets the uncertain region (entropy 0.40–0.60) to maximize training efficiency.

Clarification limit: After 2 consecutive clarification turns in the same session, the system proceeds regardless. This prevents the system from getting stuck in a loop with genuinely ambiguous users.


The Document Classifier

classifier.py β€” three-stage cascade that learns with every ingestion.

Incoming document
        |
        v
Sparse/tabular pre-check (word count < 200 OR unique word ratio > 0.85)
  YES β†’ visual classification (structural fingerprint to LLM)
  NO  β†’ continue
        |
        v
Stage 1: Centroid nearest-neighbour
  Cosine similarity to stored category centroids
  Confidence β‰₯ 0.72 β†’ done (no API call)
        |
        v
Stage 2: Ensemble vote
  Signal A: cosine to centroids        (weight 0.45)
  Signal B: cosine to label embeddings (weight 0.30)
  Signal C: TF-IDF keyword matching    (weight 0.25)
  Score β‰₯ 0.38 β†’ done
        |
        v
Stage 3: LLM chain-of-thought
  Sends excerpt to classifier LLM
  Classifies FORMAT + STRUCTURE, not just topic
  Fallback: "general_document"

After classification, the winning category's centroid is updated with this document's embedding β€” the classifier improves with every ingestion.

User override lock: If ingested_files.user_overridden=True, the entire cascade is skipped. Returns synthetic result with stage_used="user_override", confidence=1.0.


The Three Self-Improvement Loops

Morpheus has three feedback loops that make it more accurate over time:

Loop 1 β€” Intent classifier (every 25 queries) User queries logged to intent_feedback. Every 25 new rows, the model retrains automatically and saves to disk. Learns the specific query patterns of your users.

Loop 2 β€” Document classifier (every ingestion) Each ingested document updates its category centroid. The next similar document gets classified at Stage 1 (no API call) instead of needing the LLM fallback. Classification gets faster and more accurate as the corpus grows.

Loop 3 β€” Reranker distillation (pipeline in place) Every query logs Cohere rerank scores to rerank_feedback. These accumulated labels will be used to train a local CrossEncoder to match Cohere quality without the API cost.


What Morpheus Does Not Yet Have

These should not be documented or discussed as shipped capabilities:

  • URL ingestion
  • Markdown ingestion
  • Code/config/API source ingestion
  • First-class source_kind, data_shape, and parser_kind schema
  • True agentic retrieval with an iterative planner/executor loop

Future additions should be documented here only after the code path, tests, and rollout controls exist.


The Frontend

Authentication Flow

Page load
  initSupabase() fetches Supabase keys from /api/v1/config
  supabaseClient.auth.getSession()
  Session exists β†’ showApp() + bootApp()
  No session β†’ showLogin()

Login
  supabaseClient.auth.signInWithPassword(email, password)
  Supabase-js stores JWT in localStorage automatically
  Every API call: getSupabaseToken() reads it from localStorage
  Sent as X-Auth-Token header
  Backend require_auth_token Depends() validates JWT and returns user_id

Global State

state.js β€” single source of truth:

Key Contents
STATE.files Ingested document list from /api/v1/corpus/files
STATE.categories Category strings
STATE.catColors Color mapping for graph
STATE.chatHistory Current conversation turns
STATE.sessionId UUID per browser tab
STATE.simulation D3 force simulation reference
STATE.alpha Retrieval weight slider (0=keyword, 1=semantic)
STATE.isThinking Double-submit guard

Upload + Progress

corpus.js β€” processUpload() calls apiIngestFile(), then enters pollIngestStatus() which polls every 2 seconds and exits only on COMPLETED or FAILED. Shows cycling heartbeat messages through pipeline stages while waiting.

Chat Streaming

chat.js β€” sendChat() has a 500ms debounce guard. The assistant bubble is created immediately with thinking dots. async onToken() yields to the browser with await new Promise(r => setTimeout(r, 0)) after each token update so the DOM repaints during streaming rather than all at once at the end.

Graph

graph.js β€” Obsidian-style D3 force simulation. graphReheat() uses alpha(0.3) not alphaTarget(0.2). The alpha() method sets current energy directly β€” it forces a restart even when the simulation has fully stopped. alphaTarget only sets where energy wants to decay toward, which does nothing if the simulation is already below alphaMin. onGraphTabVisible() is called from main.js with a 50ms delay for the CSS display change to propagate before D3 reads panel dimensions.


Key Design Decisions

Why Celery + Redis? Ingestion takes 60–120 seconds (OCR, AI summaries, RAPTOR tree building). FastAPI requests time out well before that. Celery lets the task run in the background while the browser polls for status.

Why service-role key for writes? Celery workers have no browser session, so auth.uid() is NULL in the database. The security boundary is enforced at the API level β€” the JWT is validated before the task is queued and the user_id is passed explicitly to the insert_document_chunk RPC.

Why RAPTOR tree indexing? Flat chunking misses questions that span multiple sections ("total credits across all categories"). RAPTOR builds parent summaries that aggregate child content, enabling retrieval at multiple granularities. Root nodes answer overview questions; leaf nodes answer specific details.

Why tree search for structural queries? Vector similarity is calibrated to semantic meaning, not document structure. A query for "Capstone Project credits" fails vector search because the chunk summary emphasises overall credit structure, not individual line items. Tree search traverses the document hierarchy and finds the exact node.

Why semantic cache with version invalidation? Repeated questions should not cost API calls. But cached answers must go stale when the corpus changes. Version-based invalidation solves the second problem without tracking which cache entry references which document β€” increment the version, all old entries become invisible.

Why 3-tier reranker? Cohere costs money and has rate limits. CrossEncoder is free but needs local GPU. Lexical always works. This order maximises quality while guaranteeing retrieval never fails completely.

Why alpha(0.3) not alphaTarget(0.2) in D3 graph reheat? alphaTarget sets where the simulation wants to decay toward. If the simulation has already stopped (alpha < alphaMin = 0.001), alphaTarget does nothing. The alpha() method sets current energy directly and always forces a restart.

Why quality work before new source kinds? The current system already has enough retrieval and generation complexity that quality regressions can hide behind β€œmore features.” The project workflow now explicitly prioritizes trace review, failure-mode reduction, and retrieval tuning before adding URL, Markdown, or code ingestion.

Why code ingestion should be separate later? Code/config/API material is structurally different from prose documents. When Morpheus gains code support, it should use a dedicated exact/AST-first path rather than treating code as generic document chunks.


Environment Variables

Variable Purpose Required
SUPABASE_URL Supabase project URL Yes
SUPABASE_ANON_KEY Frontend-safe key (user-scoped reads) Yes
SUPABASE_SERVICE_KEY Server-only key (bypasses RLS for writes) Yes
SUPABASE_JWT_SECRET JWT signature verification Yes
OPENROUTER_API_KEY OpenRouter API access Yes
GROQ_API_KEY Groq API access (primary generation) Yes
GEMINI_API_KEY Google Gemini access (ingestion + vision) Yes
COHERE_API_KEY Cohere reranking Yes
REDIS_URL Redis connection string Yes
CELERY_WORKER_CANCEL_ON_CONNECTION_LOSS Whether broker disconnect cancels active ingestion tasks; keep false for long uploads No
CELERY_TASK_ACKS_LATE Whether ingestion tasks acknowledge only after completion; keep false to avoid redelivery loops on flaky Redis No
CELERY_TASK_REJECT_ON_WORKER_LOST Whether lost workers requeue tasks; keep false unless duplicate-safe retries are required No
MASTER_ADMIN_KEY Admin endpoint access Yes
ALLOWED_ORIGINS CORS allowed origins (use * for dev only) Yes
DOCS_ENABLED Enable /docs and /redoc (set false in prod) No
LOG_LEVEL Logging verbosity (INFO or DEBUG) No
AUTO_START_CELERY Auto-spawn Celery subprocess on startup No
HF_HUB_DISABLE_XET Disable Xet-backed model downloads during build/runtime No

Common Debugging

Ingestion crashes at embedding step Look for: ValueError: Model X returned null embeddings Cause: OpenRouter returns HTTP 200 with data=null Fix: FallbackEmbeddings null guard retries the next model automatically β€” check provider logs for rate limits

Cache not invalidating after delete Check Redis for key nexus:kb_version:{user_id} If missing: the version key was never written β€” run a fresh ingest to initialise it

Graph not reheating on tab switch Check onGraphTabVisible is defined in graph.js Check _hookGraphTabVisible IIFE in main.js Expected: graph animates within 50ms of tab click

Classifier ignoring user category Check: ingested_files.user_overridden = true for that file hash Logs should show: User override active β€” forcing category 'X', skipping classifier

__CACHE_HIT__ showing as a source chip in the UI Hard-refresh (Ctrl+Shift+R) to load the latest chat.js The visibleSources filter in onDone() strips sentinel entries

Gemini 404 errors during ingestion Check GEMINI_TEXT_MODELS and GEMINI_VISION_MODELS in config.py Must be gemini-2.5-flash and gemini-2.5-flash-lite gemini-1.5-flash and gemini-2.0-flash are deprecated


Update Rule for This File

When the architecture changes, update this document in this order:

  1. Current live behavior
  2. Trace and observability impact
  3. Deterministic vs probabilistic impact
  4. Roadmap/status movement between phases

Do not document planned capabilities as shipped.


Complete Request Flow Example

User asks: "What are the core courses?"
Category filter: academic_syllabus

1. Browser POST /api/v1/query
   Headers: X-Auth-Token: eyJ...
   Body: {query, category="academic_syllabus", history, session_id, alpha=0.5}

2. require_auth_token: decodes JWT β†’ user_id="ee903934..."

3. analyse_intent()
   sklearn: needs_clarification=False, confidence=1.00
   Category active: enriched query = "query academic_syllabus"
   Logs to intent_feedback

4. Route selection
   route_class = factoid
   selected_experts = ["dense_chunk", ...]
   expert_weights persisted into trace metadata
   _should_use_tree_path() β†’ False (not a structural keyword query)
   retrieve_chunks() hybrid path

5. Semantic cache check: MISS (first time this query)
   generate_sub_queries β†’ ["B.Tech CSE core courses", "program core credits", ...]
   hybrid_search RPC Γ— 3 sub-queries β†’ 12 raw candidates
   Cohere rerank β†’ ranked by relevance score
   Threshold + diversity filter β†’ 3 final chunks
   Store in _last_chunks[session_key]
   Log rerank feedback (background thread)

6. generate_answer_stream()
   match_memory RPC β†’ 2 past relevant Q&A pairs
   Build prompt: system + 3 chunks + 2 memories + history + query
   Groq astream() β†’ tokens arrive one by one
   Yield {type:"token", content:"The"}, {type:"token", content:" core"}, ...
   After streaming: save Q&A to chat_memory (background thread)
   Store in semantic cache (version v4, TTL 7 days for academic_syllabus)
   Persist query trace and evaluation logs

7. Yield {type:"done", sources:[...], images:[...], trace_id:"..."}

8. Browser: onToken() fills bubble token by token
   onDone() appends source chips and keeps the trace id available for review

Last updated: April 2026