morpheus-rag / docs /model_migration_roadmap.md
nothex
fix: deployment readiness — auth, naming, Dockerfile, render config
723ce57

MORPHEUS Model Migration Roadmap (Prototype -> Enterprise)

Step 0: Preserve vector schema compatibility

  • Do not change embedding dimensions in public.documents.embedding until you are ready for a full re-embed migration.
  • Keep retrieval wiring stable so you can measure impact of each model swap in isolation.

Step 1: Switch generation to a local provider (Ollama)

What’s implemented:

  • Generation calls in backend/core/pipeline.py now support a provider toggle:
    • MORPHEUS_LLM_PROVIDER=openrouter (default)
    • MORPHEUS_LLM_PROVIDER=ollama
  • Ollama models are configured via:
    • OLLAMA_MODELS (comma-separated) or OLLAMA_MODEL (single model)
    • OLLAMA_BASE_URL (defaults to http://localhost:11434)

How to run:

  1. Ensure Ollama is running locally and can load your target model.
  2. Set environment variables:
    • MORPHEUS_LLM_PROVIDER=ollama
    • OLLAMA_MODELS=llama3

Notes:

  • This migration swaps the generation model path (answering, summaries, and classifier stage-3/metadata extraction that uses backend/core/pipeline.py generation wrappers).
  • Retrieval embeddings are not migrated in this step; vector dimension compatibility remains intact.

Step 2: Reranking strategy (cost + uptime)

What’s implemented:

  • Retrieval reranking still uses Cohere when available.
  • If Cohere reranking fails (including rate-limit errors), backend/core/pipeline.py now falls back to a local lexical rerank heuristic instead of returning raw order.

Next execution (recommended after Step 1):

  • Add an explicit reranking mode toggle to disable Cohere completely for deterministic cost.
  • If you want enterprise-grade reranking without API dependency, replace Cohere with a local cross-encoder reranker (or a smaller bi-encoder + rerank pass).

Step 3: Embeddings migration (only when ready)

  • Migrate embeddings to local/stable models only after you can guarantee:
    • embedding dimension parity with the existing vector column, or
    • a versioned vector table strategy (e.g., documents_v2) + controlled cutover.

Step 4: Observability + regression gates

  • Every model swap should be tracked via:
    • Golden dataset metrics (backend/eval/*)
    • p95 latency and error rates (especially 429)
    • “reward signal” distributions across parameter grids (alpha, top-k).