# MORPHEUS Model Migration Roadmap (Prototype -> Enterprise)

## Step 0: Preserve vector schema compatibility
- Do not change embedding dimensions in `public.documents.embedding` until you are ready for a full re-embed migration.
- Keep retrieval wiring stable so you can measure impact of each model swap in isolation.

## Step 1: Switch generation to a local provider (Ollama)
What’s implemented:
- Generation calls in `backend/core/pipeline.py` now support a provider toggle:
  - `MORPHEUS_LLM_PROVIDER=openrouter` (default)
  - `MORPHEUS_LLM_PROVIDER=ollama`
- Ollama models are configured via:
  - `OLLAMA_MODELS` (comma-separated) or `OLLAMA_MODEL` (single model)
  - `OLLAMA_BASE_URL` (defaults to `http://localhost:11434`)

How to run:
1. Ensure Ollama is running locally and can load your target model.
2. Set environment variables:
   - `MORPHEUS_LLM_PROVIDER=ollama`
   - `OLLAMA_MODELS=llama3`

Notes:
- This migration swaps the *generation* model path (answering, summaries, and classifier stage-3/metadata extraction that uses `backend/core/pipeline.py` generation wrappers).
- Retrieval embeddings are not migrated in this step; vector dimension compatibility remains intact.

## Step 2: Reranking strategy (cost + uptime)
What’s implemented:
- Retrieval reranking still uses Cohere when available.
- If Cohere reranking fails (including rate-limit errors), `backend/core/pipeline.py` now falls back to a local lexical rerank heuristic instead of returning raw order.

Next execution (recommended after Step 1):
- Add an explicit reranking mode toggle to disable Cohere completely for deterministic cost.
- If you want enterprise-grade reranking without API dependency, replace Cohere with a local cross-encoder reranker (or a smaller bi-encoder + rerank pass).

## Step 3: Embeddings migration (only when ready)
- Migrate embeddings to local/stable models only after you can guarantee:
  - embedding dimension parity with the existing vector column, or
  - a versioned vector table strategy (e.g., `documents_v2`) + controlled cutover.

## Step 4: Observability + regression gates
- Every model swap should be tracked via:
  - Golden dataset metrics (`backend/eval/*`)
  - p95 latency and error rates (especially 429)
  - “reward signal” distributions across parameter grids (alpha, top-k).