# MORPHEUS Model Migration Roadmap (Prototype -> Enterprise) ## Step 0: Preserve vector schema compatibility - Do not change embedding dimensions in `public.documents.embedding` until you are ready for a full re-embed migration. - Keep retrieval wiring stable so you can measure impact of each model swap in isolation. ## Step 1: Switch generation to a local provider (Ollama) What’s implemented: - Generation calls in `backend/core/pipeline.py` now support a provider toggle: - `MORPHEUS_LLM_PROVIDER=openrouter` (default) - `MORPHEUS_LLM_PROVIDER=ollama` - Ollama models are configured via: - `OLLAMA_MODELS` (comma-separated) or `OLLAMA_MODEL` (single model) - `OLLAMA_BASE_URL` (defaults to `http://localhost:11434`) How to run: 1. Ensure Ollama is running locally and can load your target model. 2. Set environment variables: - `MORPHEUS_LLM_PROVIDER=ollama` - `OLLAMA_MODELS=llama3` Notes: - This migration swaps the *generation* model path (answering, summaries, and classifier stage-3/metadata extraction that uses `backend/core/pipeline.py` generation wrappers). - Retrieval embeddings are not migrated in this step; vector dimension compatibility remains intact. ## Step 2: Reranking strategy (cost + uptime) What’s implemented: - Retrieval reranking still uses Cohere when available. - If Cohere reranking fails (including rate-limit errors), `backend/core/pipeline.py` now falls back to a local lexical rerank heuristic instead of returning raw order. Next execution (recommended after Step 1): - Add an explicit reranking mode toggle to disable Cohere completely for deterministic cost. - If you want enterprise-grade reranking without API dependency, replace Cohere with a local cross-encoder reranker (or a smaller bi-encoder + rerank pass). ## Step 3: Embeddings migration (only when ready) - Migrate embeddings to local/stable models only after you can guarantee: - embedding dimension parity with the existing vector column, or - a versioned vector table strategy (e.g., `documents_v2`) + controlled cutover. ## Step 4: Observability + regression gates - Every model swap should be tracked via: - Golden dataset metrics (`backend/eval/*`) - p95 latency and error rates (especially 429) - “reward signal” distributions across parameter grids (alpha, top-k).