Spaces:
Running
Morpheus β Architecture Guide
Ask anything. Your documents answer.
This file is the source of truth for how Morpheus works today, what is already live in the codebase, and what the project is prioritizing next.
What Morpheus Is Today
Morpheus is a multi-tenant RAG platform for user-uploaded PDFs.
Users upload PDF documents. They ask questions in natural language. Morpheus retrieves evidence and streams grounded answers with citations and diagnostics.
What is already live:
- Each user sees only their own documents, enforced through tenant-scoped writes and RLS-backed reads
- Retrieval combines BM25 keyword search + pgvector semantic search + reranking
- Routing is multi-path, not single-path: exact/page-scoped lookup, structural tree search, dense retrieval, graph-assisted retrieval, and memory-aware follow-ups
- Repeated questions can short-circuit through a semantic cache
- Conversations are remembered across sessions via episodic memory
- Query traces capture route selection, expert weights, retrieval diagnostics, and answer quality metadata
- If one AI provider fails, the system tries the next provider/model in the chain
Important framing:
- Morpheus is a RAG engine
- Morpheus does have a mixture-of-experts-style orchestration layer at retrieval time
- Morpheus is not a trained neural MoE model
- Morpheus is not yet a true agentic retrieval system with an iterative planner/executor loop
- Morpheus currently supports PDF upload ingestion as the primary production path, plus a new local Python code graph indexing path for graph-first exploration
Current Engineering Priority Order
This is the active operating order for the project. New source kinds should not leapfrog retrieval quality work.
Phase 0 β Understand What We Have
- Implemented: per-query traces with route selection, selected experts, expert weights, rerank audit, diagnostics, and quality metrics
- Implemented: router mechanism and retrieval branch selection in code
- Required next operating discipline: maintain a manually scored baseline of 50-100 real queries
Phase 1 β Fix Quality on the Existing Path
- Highest priority: improve retrieval and generation quality on the current PDF path
- Focus areas: chunking, hybrid weighting, reranker thresholds, grounding instructions, hallucination guardrails
Phase 2 β Add Source-Kind Architecture
- Only after Phase 1 is stable
- Add
source_kind,data_shape, andparser_kind - Add code ingestion as a separate structured pipeline
- Add URL ingestion behind strict security controls
Status note:
- Initial Phase 2 substrate is now live:
source_kind,data_shape,parser_kind, DB-backedgraph_runs, graph-first API endpoints, and deterministic Python code graph indexing via local script - Remaining Phase 2 work is broader source support (
markdown,url, richer code languages) and deeper graph-first answer orchestration
Phase 3 β Scale and Cost Optimization
- Model tiering by route/query complexity
- Embedding cache layer
- Reranker gating based on retrieval confidence
Supported Sources Right Now
| Source | Status | Notes |
|---|---|---|
| PDF upload | Live | Only production ingestion path today |
| URL ingestion | Not live | Planned for a later phase |
| Markdown files | Not live | Planned via source-kind architecture |
| Python code graph indexing | Live (local/scripted) | Deterministic AST-first graph indexing for graph-first exploration; not a browser upload path |
| Code/config/API files (beyond Python graph indexing) | Partial | Broader structured ingestion still planned as a separate pipeline, not an extension of generic document chunking |
Project Structure
morpheus/
βββ backend/
β βββ main.py FastAPI app, startup wiring, rate limiter
β βββ api/
β β βββ auth.py /api/v1/auth/*
β β βββ query.py /api/v1/query β SSE streaming query path
β β βββ corpus.py /api/v1/corpus/*
β β βββ ingest.py /api/v1/ingest/*
β β βββ graph.py /api/v1/graph β graph search, path, export
β β βββ frontend_config.py /api/v1/config
β β βββ admin.py traces + feedback admin endpoints
β βββ core/
β βββ pipeline.py Main orchestration layer
β βββ pipeline_routing.py Route classes + expert weighting
β βββ pipeline_retrieval.py Retrieval helpers
β βββ pipeline_generation.py Generation helpers
β βββ pipeline_ingestion.py PDF ingestion workflow
β βββ pipeline_pageindex.py Structural tree retrieval
β βββ pipeline_memory.py Episodic memory helpers
β βββ pipeline_ambiguity.py Scope and ambiguity handling
β βββ pipeline_types.py Shared pipeline metadata types
β βββ graph_hybrid.py Source metadata + graph-first search/path/export helpers
β βββ code_graph.py Deterministic Python AST graph indexing
β βββ providers.py LLM + embedding provider fallback
β βββ classifier.py 3-stage document classifier
β βββ intent_classifier.py sklearn intent model, online retraining
β βββ cache_manager.py Semantic Redis cache with version invalidation
β βββ auth_utils.py JWT helpers + require_auth_token Depends()
β βββ config.py All constants and tuneable settings
β βββ rate_limit.py Shared limiter setup
β βββ tasks.py Celery task for background PDF ingestion
βββ frontend/
β βββ index.html
β βββ js/
β βββ config.js Runtime config
β βββ api.js All fetch() calls β single source of truth
β βββ state.js Global STATE object
β βββ chat.js Streaming chat UI
β βββ corpus.js Upload + document management
β βββ graph.js D3 force-directed knowledge graph
β βββ inspect.js Node detail panel
β βββ ui.js Shared UI helpers
β βββ main.js Boot sequence + auth gate
βββ shared/
β βββ types.py Pydantic models shared by API and pipeline
βββ supabase/
βββ migrations/
βββ rls/
The Database (Supabase / PostgreSQL)
Tables
documents β The vector store. Every chunk and summary node from every PDF lives here.
| Column | Type | Purpose |
|---|---|---|
id |
uuid | Deterministic ID: uuid5(file_hash + chunk_index) |
content |
text | Chunk text that gets searched |
metadata |
jsonb | source, file_hash, document_type, page numbers, retrieval metadata |
embedding |
vector(2048) | nvidia-nemotron embedding for pgvector search |
user_id |
uuid | RLS tenant isolation |
ingested_files β Dedup registry.
| Column | Type | Purpose |
|---|---|---|
file_hash |
text | SHA-256 of the PDF β the dedup key |
filename |
text | Display name in the UI |
document_type |
text | Category e.g. academic_syllabus |
source_kind |
text | pdf, code, markdown, url, etc. |
data_shape |
text | structured, unstructured, or hybrid |
parser_kind |
text | Parser used e.g. pdf_partition, python_ast |
chunk_count |
int | Includes RAPTOR tree nodes |
user_id |
uuid | Tenant isolation |
user_overridden |
bool | True if user manually changed category β classifier skips |
chat_memory β Episodic memory, searchable by semantic similarity.
| Column | Type | Purpose |
|---|---|---|
session_id |
text | Groups messages from the same conversation |
role |
text | user or assistant |
content |
text | The message text |
embedding |
vector | For semantic search via match_memory RPC |
user_id |
uuid | Tenant isolation |
document_trees β Hierarchical tree index for structural queries.
| Column | Type | Purpose |
|---|---|---|
file_hash |
text | Links tree to the source document |
tree_json |
jsonb | Recursive node structure: {title, content, children} |
user_id |
uuid | Tenant isolation |
category_centroids β The document classifier's learned memory.
| Column | Type | Purpose |
|---|---|---|
document_type |
text | Category label |
centroid_vector |
array | Running average embedding of all docs of this type |
document_count |
int | Number of documents that contributed |
user_id |
uuid | Per-tenant centroids |
evaluation_logs β RAGAS quality metrics written after every query.
rerank_feedback β Every Cohere rerank decision, stored for future CrossEncoder distillation.
intent_feedback β Online training data for the intent classifier.
query_traces β Per-query trace record with route mode, selected experts, expert weights, candidate counts, diagnostics, and quality metrics.
graph_nodes / graph_edges β Graph foundation built during ingestion and enriched by query/feedback workflows.
graph_runs β Auditable record of each graph extraction/indexing pass, including parser/source metadata and content hash.
Category list β Derived directly from each tenant's ingested_files.document_type values.
Supabase RPC Functions
| Function | Purpose |
|---|---|
hybrid_search(query_text, query_embedding, match_count, filter, semantic_weight, keyword_weight, p_user_id) |
Combined BM25 + pgvector search (tenant-scoped overload) |
match_memory(query_embedding, match_session_id, match_count) |
Semantic search over chat history |
insert_document_chunk(p_id, p_content, p_metadata, p_embedding, p_user_id) |
Secure insert with explicit user_id |
get_document_types() |
Returns distinct categories for this tenant |
Row Level Security
Every table has RLS policies. Core rule: user_id = auth.uid() for reads.
Writes from Celery workers use the service-role key (Celery has no browser session so auth.uid() is NULL), but always inject user_id explicitly via the insert_document_chunk RPC β extracted from the JWT at the API boundary before the task is queued.
The Ingestion Pipeline
Browser
POST /api/v1/ingest/upload (X-Auth-Token header)
FastAPI: JWT validated, upload limits checked, MIME type checked
Per-user document count checked (50 max)
PDF saved to temp file
process_pdf_task.delay() β Redis queue
Returns {task_id} immediately
Browser polls /api/v1/ingest/status/{task_id} every 2 seconds
Celery worker (background):
Step 1: Dedup check
SHA-256 fingerprint of PDF
Check ingested_files table (O(1) indexed lookup)
Already ingested β return "already_ingested"
user_overridden=True β skip classifier, use forced_category
Step 2: PDF partitioning (unstructured library)
partition_pdf() β OCR + layout detection
extract_images_from_pdf() β PyMuPDF, filters tiny/skewed images
Returns Element objects (Title, NarrativeText, Table, Image...)
Step 3: Classification (classifier.py)
Three-stage cascade:
Stage 1: Centroid nearest-neighbour (no API call, cosine similarity)
Confidence >= 0.72 β done
Stage 2: Ensemble vote (centroid + label-embed + TF-IDF)
Score >= 0.38 β done
Stage 3: LLM chain-of-thought (novel document types only)
Sparse/tabular pre-check: routes to visual classification if word count < 200
After classification: centroid updated with this document's vector
Step 4: Chunking + AI summaries
chunk_by_title() groups elements into logical sections
Chunks with tables or images: parallel AI vision summarisation (5 workers)
Each chunk becomes a LangChain Document with rich metadata
Step 5: RAPTOR tree indexing
Groups leaf chunks into clusters of 5
LLM generates parent summary for each cluster
Repeats up the tree until single root node
Root node answers "what is this document about?"
Leaf nodes answer specific detail questions
All nodes (leaves + summaries) uploaded to documents table
Step 6: Embedding + upload
Batch embed all nodes via nvidia-nemotron (2048 dims)
Insert each via insert_document_chunk RPC (explicit user_id)
Register in ingested_files
Invalidate semantic cache for this user (kb_version++)
Step 7: Graph foundation persistence
Document/entity/topic graph rows written to graph_nodes / graph_edges
Current ingestion boundary
- Live source model: PDF only
- Live semantic category model:
document_type - Not yet live:
source_kind,data_shape,parser_kind - Design rule: future code/config/API ingestion should not be bolted onto this PDF chunking path
The Retrieval Pipeline
Browser
POST /api/v1/query {query, category, history, session_id, alpha}
X-Auth-Token header
FastAPI validates JWT, starts SSE streaming response
Step 1: Intent analysis (analyse_intent)
Local sklearn classifier β under 5ms, no API call
Inputs: query text, has_category, has_history
Output: {is_clear, enriched_query, clarification_question}
If needs clarification β stream question back, stop
Clarification limit: after 2 consecutive turns, proceed regardless
Reference queries ("summarise it"): replaced with previous query
Every query logged to intent_feedback for online retraining
Step 1.5: Ambiguity / scope safety (check_query_ambiguity)
If the user has NOT pinned a document:
- If **multiple docs are in scope** and the query is **identity/page-scoped** (owner/title/publisher/cover/first page), Morpheus **asks the user to pick a document** (never guesses).
- Otherwise, Morpheus may ask a clarification question for generic queries when multiple docs match.
Implementation detail: ambiguity scoring uses `hybrid_search(..., p_user_id=...)` to avoid PostgREST overload ambiguity.
Step 2: Query routing
Route class chosen first:
exact_fact / page_scoped / summary / follow_up / compare / multi_part / relational / factoid / no_retrieval
Expert weights assigned and persisted:
dense_chunk / raptor_summary / graph_traversal / episodic_memory / hybrid_compare
Special deterministic branch:
identity_store
Structural queries?
β tree_search(): recursive traversal of document_trees for this user
β If tree search returns 0 results: falls back to hybrid search
Exact/page-scoped questions with curated evidence?
β identity_store path
Relational / graph-supported questions?
β graph retrieval can join the candidate pool
Everything else β retrieve_chunks() hybrid path
Step 3: retrieve_chunks() β hybrid retrieval path
a) Follow-up detection
Query β€8 words with pronouns (it/this/that/they)?
Reuse _last_chunks[session_key] β no re-search
Safety guard: ordinal follow-ups like "the second one" must have an explicit referent (a list);
otherwise the API asks for clarification instead of guessing.
b) Semantic cache check
Embed query (256-entry in-memory LRU cache)
Scan Redis for cosine similarity β₯ 0.92
Cache hit β return __CACHE_HIT__ sentinel document
c) Query rewriting
LLM breaks query into 1-3 targeted sub-queries
Short queries (β€3 words) skip this step
d) Hybrid search (per sub-query)
hybrid_search RPC: BM25 + pgvector combined
alpha=0.5 = equal weight (adjustable via UI slider)
Deduplicates across sub-queries by chunk ID
Category filter active if user selected one
Graph pinning can restrict search to pinned files
e) Reranking (3-tier fallback)
Tier 1: Cohere rerank-multilingual-v3.0 (cloud, best quality)
Tier 2: CrossEncoder ms-marco-MiniLM-L-6-v2 (local, free)
Tier 3: Lexical Jaccard similarity (pure Python, always works)
Relevance threshold: 0.35 (relaxed to 0.05 for small corpus)
Diversity filter: max 2 chunks per source, cross-category seeding
f) Log rerank feedback (fire-and-forget thread)
Step 4: generate_answer_stream()
__CACHE_HIT__ sentinel β stream cached answer directly, skip LLM
match_memory RPC β retrieve past relevant Q&A pairs (episodic memory)
identity_store-only exact/page-scoped route can bypass normal generative answering
Build prompt: system + retrieved chunks + memories + history + query
Stream tokens: Groq β Gemini β OpenRouter fallback chain
After streaming: save Q&A pair to chat_memory (background thread)
Store answer in semantic cache (versioned key, TTL by document type)
Step 5: Trace + quality persistence
Persist query_traces row:
trace_id, route_mode, selected_experts, expert_weights,
candidate_counts, doc_diagnostics, quality_metrics, latency
Persist evaluation_logs
Optionally enrich trace graph links
Step 6: Emit sources
{type: "done", sources: [...], images: [...]} SSE event
Deterministic vs Probabilistic Paths
This distinction should stay explicit in the architecture and in future roadmap work.
More deterministic today
- JWT validation and tenant scoping
- File dedup
- MIME gating for ingestion
- Identity-store retrieval for exact/page-scoped questions
- PageIndex tree traversal
- Explicit ambiguity gating and document pinning
- Query trace persistence
More probabilistic today
- Query rewriting
- Embedding retrieval
- BM25/vector score balancing
- Reranking
- LLM generation
- LLM fallback document classification
- RAPTOR summary generation
Design rule
For high-precision questions, Morpheus should prefer deterministic branches first and only use probabilistic retrieval/generation when necessary.
The Provider System
ProviderFactory.build_chat_llm(purpose=...)
purpose="text" Groq β Gemini β OpenRouter
purpose="ingestion" Gemini (1M context) β OpenRouter
purpose="vision" Gemini (native multimodal) β OpenRouter vision
purpose="rewriter" OpenRouter β Groq
purpose="classifier" OpenRouter classifier models only
Embeddings: nvidia/llama-nemotron-embed-vl-1b-v2:free (2048 dims) β text-embedding-3-small
Current model lists (all configurable in config.py):
| Provider | Models (in fallback order) |
|---|---|
| Groq | llama-4-scout-17b β llama-3.3-70b-versatile β qwen3-32b β llama-3.1-8b-instant |
| Gemini | gemini-2.5-flash β gemini-2.5-flash-lite |
| OpenRouter | stepfun/step-3.5-flash:free β nvidia/nemotron-3-super-120b:free β arcee-ai/trinity-large-preview:free β meta-llama/llama-3.3-70b-instruct:free β more |
Retry logic in each provider wrapper: 404, 429, 503 are retryable β the wrapper moves to the next model in the list.
The Semantic Cache
cache_manager.py β version-invalidated, similarity-based lookup.
- Each user has a
kb_versioninteger in Redis:nexus:kb_version:{user_id} - Cache entries keyed by version:
nexus:qcache:{user_id}:v{version}:... - Lookup: scan all entries for this user+version, find best cosine similarity
- Hit threshold: 0.92 (strict β wrong answers are worse than cache misses)
- Corpus change (ingest or delete):
increment_kb_version()β version N β N+1 - All v1 entries become invisible under v2 β no explicit deletion needed
TTL by document type: academic_syllabus and reference_chart cache for 7 days; technical_manual and research_paper for 3 days; financial_report and hr_policy for 1 day; general_document for 1 hour.
Trace-First Quality Workflow
Phase 0 is partly implemented in code and should now be treated as normal workflow, not optional ops work.
Already live
trace_idemitted through the query pathquery_tracespersistenceselected_expertsandexpert_weights- rerank audit data
- route class and route reason
- candidate counts and document diagnostics
- answer preview and latency logging
evaluation_logswrites
What still needs to become routine
- Manual review set of 50-100 real queries
- Explicit tagging of recurring failure modes
- Regular tuning cycles driven by query traces instead of anecdotal examples
This phase remains the entry point for major retrieval changes.
The Intent Classifier
intent_classifier.py β sklearn, runs locally, under 5ms per query.
What it classifies: Is this query clear enough to search, or should we ask for clarification?
Features: has_category flag, has_history flag, query text embedded via all-MiniLM-L6-v2.
Online learning: Every query is logged to intent_feedback. Every 25 rows, the model retrains on accumulated examples and saves to intent_model.pkl. Active learning targets the uncertain region (entropy 0.40β0.60) to maximize training efficiency.
Clarification limit: After 2 consecutive clarification turns in the same session, the system proceeds regardless. This prevents the system from getting stuck in a loop with genuinely ambiguous users.
The Document Classifier
classifier.py β three-stage cascade that learns with every ingestion.
Incoming document
|
v
Sparse/tabular pre-check (word count < 200 OR unique word ratio > 0.85)
YES β visual classification (structural fingerprint to LLM)
NO β continue
|
v
Stage 1: Centroid nearest-neighbour
Cosine similarity to stored category centroids
Confidence β₯ 0.72 β done (no API call)
|
v
Stage 2: Ensemble vote
Signal A: cosine to centroids (weight 0.45)
Signal B: cosine to label embeddings (weight 0.30)
Signal C: TF-IDF keyword matching (weight 0.25)
Score β₯ 0.38 β done
|
v
Stage 3: LLM chain-of-thought
Sends excerpt to classifier LLM
Classifies FORMAT + STRUCTURE, not just topic
Fallback: "general_document"
After classification, the winning category's centroid is updated with this document's embedding β the classifier improves with every ingestion.
User override lock: If ingested_files.user_overridden=True, the entire cascade is skipped. Returns synthetic result with stage_used="user_override", confidence=1.0.
The Three Self-Improvement Loops
Morpheus has three feedback loops that make it more accurate over time:
Loop 1 β Intent classifier (every 25 queries)
User queries logged to intent_feedback. Every 25 new rows, the model retrains automatically and saves to disk. Learns the specific query patterns of your users.
Loop 2 β Document classifier (every ingestion) Each ingested document updates its category centroid. The next similar document gets classified at Stage 1 (no API call) instead of needing the LLM fallback. Classification gets faster and more accurate as the corpus grows.
Loop 3 β Reranker distillation (pipeline in place)
Every query logs Cohere rerank scores to rerank_feedback. These accumulated labels will be used to train a local CrossEncoder to match Cohere quality without the API cost.
What Morpheus Does Not Yet Have
These should not be documented or discussed as shipped capabilities:
- URL ingestion
- Markdown ingestion
- Code/config/API source ingestion
- First-class
source_kind,data_shape, andparser_kindschema - True agentic retrieval with an iterative planner/executor loop
Future additions should be documented here only after the code path, tests, and rollout controls exist.
The Frontend
Authentication Flow
Page load
initSupabase() fetches Supabase keys from /api/v1/config
supabaseClient.auth.getSession()
Session exists β showApp() + bootApp()
No session β showLogin()
Login
supabaseClient.auth.signInWithPassword(email, password)
Supabase-js stores JWT in localStorage automatically
Every API call: getSupabaseToken() reads it from localStorage
Sent as X-Auth-Token header
Backend require_auth_token Depends() validates JWT and returns user_id
Global State
state.js β single source of truth:
| Key | Contents |
|---|---|
STATE.files |
Ingested document list from /api/v1/corpus/files |
STATE.categories |
Category strings |
STATE.catColors |
Color mapping for graph |
STATE.chatHistory |
Current conversation turns |
STATE.sessionId |
UUID per browser tab |
STATE.simulation |
D3 force simulation reference |
STATE.alpha |
Retrieval weight slider (0=keyword, 1=semantic) |
STATE.isThinking |
Double-submit guard |
Upload + Progress
corpus.js β processUpload() calls apiIngestFile(), then enters pollIngestStatus() which polls every 2 seconds and exits only on COMPLETED or FAILED. Shows cycling heartbeat messages through pipeline stages while waiting.
Chat Streaming
chat.js β sendChat() has a 500ms debounce guard. The assistant bubble is created immediately with thinking dots. async onToken() yields to the browser with await new Promise(r => setTimeout(r, 0)) after each token update so the DOM repaints during streaming rather than all at once at the end.
Graph
graph.js β Obsidian-style D3 force simulation. graphReheat() uses alpha(0.3) not alphaTarget(0.2). The alpha() method sets current energy directly β it forces a restart even when the simulation has fully stopped. alphaTarget only sets where energy wants to decay toward, which does nothing if the simulation is already below alphaMin. onGraphTabVisible() is called from main.js with a 50ms delay for the CSS display change to propagate before D3 reads panel dimensions.
Key Design Decisions
Why Celery + Redis? Ingestion takes 60β120 seconds (OCR, AI summaries, RAPTOR tree building). FastAPI requests time out well before that. Celery lets the task run in the background while the browser polls for status.
Why service-role key for writes? Celery workers have no browser session, so auth.uid() is NULL in the database. The security boundary is enforced at the API level β the JWT is validated before the task is queued and the user_id is passed explicitly to the insert_document_chunk RPC.
Why RAPTOR tree indexing? Flat chunking misses questions that span multiple sections ("total credits across all categories"). RAPTOR builds parent summaries that aggregate child content, enabling retrieval at multiple granularities. Root nodes answer overview questions; leaf nodes answer specific details.
Why tree search for structural queries? Vector similarity is calibrated to semantic meaning, not document structure. A query for "Capstone Project credits" fails vector search because the chunk summary emphasises overall credit structure, not individual line items. Tree search traverses the document hierarchy and finds the exact node.
Why semantic cache with version invalidation? Repeated questions should not cost API calls. But cached answers must go stale when the corpus changes. Version-based invalidation solves the second problem without tracking which cache entry references which document β increment the version, all old entries become invisible.
Why 3-tier reranker? Cohere costs money and has rate limits. CrossEncoder is free but needs local GPU. Lexical always works. This order maximises quality while guaranteeing retrieval never fails completely.
Why alpha(0.3) not alphaTarget(0.2) in D3 graph reheat? alphaTarget sets where the simulation wants to decay toward. If the simulation has already stopped (alpha < alphaMin = 0.001), alphaTarget does nothing. The alpha() method sets current energy directly and always forces a restart.
Why quality work before new source kinds? The current system already has enough retrieval and generation complexity that quality regressions can hide behind βmore features.β The project workflow now explicitly prioritizes trace review, failure-mode reduction, and retrieval tuning before adding URL, Markdown, or code ingestion.
Why code ingestion should be separate later? Code/config/API material is structurally different from prose documents. When Morpheus gains code support, it should use a dedicated exact/AST-first path rather than treating code as generic document chunks.
Environment Variables
| Variable | Purpose | Required |
|---|---|---|
SUPABASE_URL |
Supabase project URL | Yes |
SUPABASE_ANON_KEY |
Frontend-safe key (user-scoped reads) | Yes |
SUPABASE_SERVICE_KEY |
Server-only key (bypasses RLS for writes) | Yes |
SUPABASE_JWT_SECRET |
JWT signature verification | Yes |
OPENROUTER_API_KEY |
OpenRouter API access | Yes |
GROQ_API_KEY |
Groq API access (primary generation) | Yes |
GEMINI_API_KEY |
Google Gemini access (ingestion + vision) | Yes |
COHERE_API_KEY |
Cohere reranking | Yes |
REDIS_URL |
Redis connection string | Yes |
CELERY_WORKER_CANCEL_ON_CONNECTION_LOSS |
Whether broker disconnect cancels active ingestion tasks; keep false for long uploads |
No |
CELERY_TASK_ACKS_LATE |
Whether ingestion tasks acknowledge only after completion; keep false to avoid redelivery loops on flaky Redis |
No |
CELERY_TASK_REJECT_ON_WORKER_LOST |
Whether lost workers requeue tasks; keep false unless duplicate-safe retries are required |
No |
MASTER_ADMIN_KEY |
Admin endpoint access | Yes |
ALLOWED_ORIGINS |
CORS allowed origins (use * for dev only) |
Yes |
DOCS_ENABLED |
Enable /docs and /redoc (set false in prod) |
No |
LOG_LEVEL |
Logging verbosity (INFO or DEBUG) |
No |
AUTO_START_CELERY |
Auto-spawn Celery subprocess on startup | No |
HF_HUB_DISABLE_XET |
Disable Xet-backed model downloads during build/runtime | No |
Common Debugging
Ingestion crashes at embedding step
Look for: ValueError: Model X returned null embeddings
Cause: OpenRouter returns HTTP 200 with data=null
Fix: FallbackEmbeddings null guard retries the next model automatically β check provider logs for rate limits
Cache not invalidating after delete
Check Redis for key nexus:kb_version:{user_id}
If missing: the version key was never written β run a fresh ingest to initialise it
Graph not reheating on tab switch
Check onGraphTabVisible is defined in graph.js
Check _hookGraphTabVisible IIFE in main.js
Expected: graph animates within 50ms of tab click
Classifier ignoring user category
Check: ingested_files.user_overridden = true for that file hash
Logs should show: User override active β forcing category 'X', skipping classifier
__CACHE_HIT__ showing as a source chip in the UI
Hard-refresh (Ctrl+Shift+R) to load the latest chat.js
The visibleSources filter in onDone() strips sentinel entries
Gemini 404 errors during ingestion
Check GEMINI_TEXT_MODELS and GEMINI_VISION_MODELS in config.py
Must be gemini-2.5-flash and gemini-2.5-flash-lite
gemini-1.5-flash and gemini-2.0-flash are deprecated
Update Rule for This File
When the architecture changes, update this document in this order:
- Current live behavior
- Trace and observability impact
- Deterministic vs probabilistic impact
- Roadmap/status movement between phases
Do not document planned capabilities as shipped.
Complete Request Flow Example
User asks: "What are the core courses?"
Category filter: academic_syllabus
1. Browser POST /api/v1/query
Headers: X-Auth-Token: eyJ...
Body: {query, category="academic_syllabus", history, session_id, alpha=0.5}
2. require_auth_token: decodes JWT β user_id="ee903934..."
3. analyse_intent()
sklearn: needs_clarification=False, confidence=1.00
Category active: enriched query = "query academic_syllabus"
Logs to intent_feedback
4. Route selection
route_class = factoid
selected_experts = ["dense_chunk", ...]
expert_weights persisted into trace metadata
_should_use_tree_path() β False (not a structural keyword query)
retrieve_chunks() hybrid path
5. Semantic cache check: MISS (first time this query)
generate_sub_queries β ["B.Tech CSE core courses", "program core credits", ...]
hybrid_search RPC Γ 3 sub-queries β 12 raw candidates
Cohere rerank β ranked by relevance score
Threshold + diversity filter β 3 final chunks
Store in _last_chunks[session_key]
Log rerank feedback (background thread)
6. generate_answer_stream()
match_memory RPC β 2 past relevant Q&A pairs
Build prompt: system + 3 chunks + 2 memories + history + query
Groq astream() β tokens arrive one by one
Yield {type:"token", content:"The"}, {type:"token", content:" core"}, ...
After streaming: save Q&A to chat_memory (background thread)
Store in semantic cache (version v4, TTL 7 days for academic_syllabus)
Persist query trace and evaluation logs
7. Yield {type:"done", sources:[...], images:[...], trace_id:"..."}
8. Browser: onToken() fills bubble token by token
onDone() appends source chips and keeps the trace id available for review
Last updated: April 2026