Spaces:
Running
Running
MORPHEUS Model Migration Roadmap (Prototype -> Enterprise)
Step 0: Preserve vector schema compatibility
- Do not change embedding dimensions in
public.documents.embeddinguntil you are ready for a full re-embed migration. - Keep retrieval wiring stable so you can measure impact of each model swap in isolation.
Step 1: Switch generation to a local provider (Ollama)
What’s implemented:
- Generation calls in
backend/core/pipeline.pynow support a provider toggle:MORPHEUS_LLM_PROVIDER=openrouter(default)MORPHEUS_LLM_PROVIDER=ollama
- Ollama models are configured via:
OLLAMA_MODELS(comma-separated) orOLLAMA_MODEL(single model)OLLAMA_BASE_URL(defaults tohttp://localhost:11434)
How to run:
- Ensure Ollama is running locally and can load your target model.
- Set environment variables:
MORPHEUS_LLM_PROVIDER=ollamaOLLAMA_MODELS=llama3
Notes:
- This migration swaps the generation model path (answering, summaries, and classifier stage-3/metadata extraction that uses
backend/core/pipeline.pygeneration wrappers). - Retrieval embeddings are not migrated in this step; vector dimension compatibility remains intact.
Step 2: Reranking strategy (cost + uptime)
What’s implemented:
- Retrieval reranking still uses Cohere when available.
- If Cohere reranking fails (including rate-limit errors),
backend/core/pipeline.pynow falls back to a local lexical rerank heuristic instead of returning raw order.
Next execution (recommended after Step 1):
- Add an explicit reranking mode toggle to disable Cohere completely for deterministic cost.
- If you want enterprise-grade reranking without API dependency, replace Cohere with a local cross-encoder reranker (or a smaller bi-encoder + rerank pass).
Step 3: Embeddings migration (only when ready)
- Migrate embeddings to local/stable models only after you can guarantee:
- embedding dimension parity with the existing vector column, or
- a versioned vector table strategy (e.g.,
documents_v2) + controlled cutover.
Step 4: Observability + regression gates
- Every model swap should be tracked via:
- Golden dataset metrics (
backend/eval/*) - p95 latency and error rates (especially 429)
- “reward signal” distributions across parameter grids (alpha, top-k).
- Golden dataset metrics (