Spaces:

nothex
/

morpheus-rag

Running

App Files Files Community

morpheus-rag / docs /model_migration_roadmap.md

nothex

fix: deployment readiness — auth, naming, Dockerfile, render config

723ce57 23 days ago

|

history blame contribute delete

2.32 kB

MORPHEUS Model Migration Roadmap (Prototype -> Enterprise)

Step 0: Preserve vector schema compatibility

Do not change embedding dimensions in public.documents.embedding until you are ready for a full re-embed migration.
Keep retrieval wiring stable so you can measure impact of each model swap in isolation.

Step 1: Switch generation to a local provider (Ollama)

What’s implemented:

Generation calls in backend/core/pipeline.py now support a provider toggle:
- MORPHEUS_LLM_PROVIDER=openrouter (default)
- MORPHEUS_LLM_PROVIDER=ollama
Ollama models are configured via:
- OLLAMA_MODELS (comma-separated) or OLLAMA_MODEL (single model)
- OLLAMA_BASE_URL (defaults to http://localhost:11434)

How to run:

Ensure Ollama is running locally and can load your target model.
Set environment variables:
- MORPHEUS_LLM_PROVIDER=ollama
- OLLAMA_MODELS=llama3

Notes:

This migration swaps the generation model path (answering, summaries, and classifier stage-3/metadata extraction that uses backend/core/pipeline.py generation wrappers).
Retrieval embeddings are not migrated in this step; vector dimension compatibility remains intact.

Step 2: Reranking strategy (cost + uptime)

What’s implemented:

Retrieval reranking still uses Cohere when available.
If Cohere reranking fails (including rate-limit errors), backend/core/pipeline.py now falls back to a local lexical rerank heuristic instead of returning raw order.

Next execution (recommended after Step 1):

Add an explicit reranking mode toggle to disable Cohere completely for deterministic cost.
If you want enterprise-grade reranking without API dependency, replace Cohere with a local cross-encoder reranker (or a smaller bi-encoder + rerank pass).

Step 3: Embeddings migration (only when ready)

Migrate embeddings to local/stable models only after you can guarantee:
- embedding dimension parity with the existing vector column, or
- a versioned vector table strategy (e.g., documents_v2) + controlled cutover.

Step 4: Observability + regression gates

Every model swap should be tracked via:
- Golden dataset metrics (backend/eval/*)
- p95 latency and error rates (especially 429)
- “reward signal” distributions across parameter grids (alpha, top-k).