Spaces:
Running
Running
Use Voyage embeddings by default
Browse files- Dockerfile +3 -3
- README.md +12 -7
- backend/config.py +11 -11
- ingestion/embedder.py +5 -5
- render.yaml +6 -2
Dockerfile
CHANGED
|
@@ -15,8 +15,8 @@
|
|
| 15 |
# By downloading it during the Docker build, it's baked into the image layer.
|
| 16 |
# Subsequent starts are instant β the model is already on disk.
|
| 17 |
#
|
| 18 |
-
# The embedding model
|
| 19 |
-
#
|
| 20 |
#
|
| 21 |
# ARCHITECTURE
|
| 22 |
# ββββββββββββ
|
|
@@ -47,7 +47,7 @@ RUN pip install --user --no-cache-dir -r requirements.txt
|
|
| 47 |
|
| 48 |
# Pre-download the re-ranker model into the image layer.
|
| 49 |
# This bakes the ~80MB model into the image so cold starts don't download it.
|
| 50 |
-
# The
|
| 51 |
RUN python -c "\
|
| 52 |
from sentence_transformers import CrossEncoder; \
|
| 53 |
print('Pre-downloading re-ranker...'); \
|
|
|
|
| 15 |
# By downloading it during the Docker build, it's baked into the image layer.
|
| 16 |
# Subsequent starts are instant β the model is already on disk.
|
| 17 |
#
|
| 18 |
+
# The embedding model is NOT downloaded here β Voyage/Gemini/Nomic run via API
|
| 19 |
+
# (no local file needed). That's how we stay under the RAM limit.
|
| 20 |
#
|
| 21 |
# ARCHITECTURE
|
| 22 |
# ββββββββββββ
|
|
|
|
| 47 |
|
| 48 |
# Pre-download the re-ranker model into the image layer.
|
| 49 |
# This bakes the ~80MB model into the image so cold starts don't download it.
|
| 50 |
+
# The embedding model is NOT downloaded here β it lives behind a hosted API.
|
| 51 |
RUN python -c "\
|
| 52 |
from sentence_transformers import CrossEncoder; \
|
| 53 |
print('Pre-downloading re-ranker...'); \
|
README.md
CHANGED
|
@@ -42,7 +42,7 @@ GitHub URL
|
|
| 42 |
Falls back to line-windowed sliding chunks for unsupported languages
|
| 43 |
β ingestion_service.py (Optional) LLM generates a 1β2 sentence description per chunk
|
| 44 |
prepended before embedding β Anthropic's "contextual retrieval"
|
| 45 |
-
β embedder.py
|
| 46 |
β qdrant_store.py Each chunk stored with: dense vector + sparse BM25 vector + full payload metadata
|
| 47 |
```
|
| 48 |
|
|
@@ -201,8 +201,8 @@ The β³ button in the sidebar triggers a re-index with LLM-generated chunk descr
|
|
| 201 |
| Backend | FastAPI + uvicorn | Async ASGI, 20+ endpoints, SSE streaming throughout |
|
| 202 |
| Frontend | React + Vite | Component-based UI, localStorage sessions, SSE token streaming |
|
| 203 |
| Vector DB | Qdrant Cloud | Native hybrid search (dense + sparse), free 1 GB tier |
|
| 204 |
-
| Embeddings (default) |
|
| 205 |
-
| Embeddings (
|
| 206 |
| Code parsing | tree-sitter | Multi-language AST β Python, JS, TS, Go, Rust, Java |
|
| 207 |
| Reranker (primary) | Cohere `rerank-v3.5` | Cross-encoder, API, 1000 calls/month free |
|
| 208 |
| Reranker (fallback) | `ms-marco-MiniLM-L-6-v2` | Local cross-encoder, baked into Docker image |
|
|
@@ -264,11 +264,16 @@ cd ui && npm install && npm run dev
|
|
| 264 |
# Vector DB (required)
|
| 265 |
QDRANT_URL= # Qdrant Cloud cluster URL
|
| 266 |
QDRANT_API_KEY= # Qdrant Cloud API key
|
| 267 |
-
QDRANT_COLLECTION=
|
| 268 |
|
| 269 |
-
# Embeddings
|
| 270 |
-
|
| 271 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 272 |
|
| 273 |
# LLM (at least one required)
|
| 274 |
CEREBRAS_API_KEY= # Fastest β free at cloud.cerebras.ai (1M tok/day)
|
|
|
|
| 42 |
Falls back to line-windowed sliding chunks for unsupported languages
|
| 43 |
β ingestion_service.py (Optional) LLM generates a 1β2 sentence description per chunk
|
| 44 |
prepended before embedding β Anthropic's "contextual retrieval"
|
| 45 |
+
β embedder.py Voyage voyage-code-3 (1024-dim) via API Β· Gemini/Nomic fallback
|
| 46 |
β qdrant_store.py Each chunk stored with: dense vector + sparse BM25 vector + full payload metadata
|
| 47 |
```
|
| 48 |
|
|
|
|
| 201 |
| Backend | FastAPI + uvicorn | Async ASGI, 20+ endpoints, SSE streaming throughout |
|
| 202 |
| Frontend | React + Vite | Component-based UI, localStorage sessions, SSE token streaming |
|
| 203 |
| Vector DB | Qdrant Cloud | Native hybrid search (dense + sparse), free 1 GB tier |
|
| 204 |
+
| Embeddings (default) | Voyage `voyage-code-3` | 1024-dim, code-optimised, 200M tokens/month free |
|
| 205 |
+
| Embeddings (fallback) | Gemini `gemini-embedding-001` | 768-dim, via Gemini API; good quality but tighter free-tier limits |
|
| 206 |
| Code parsing | tree-sitter | Multi-language AST β Python, JS, TS, Go, Rust, Java |
|
| 207 |
| Reranker (primary) | Cohere `rerank-v3.5` | Cross-encoder, API, 1000 calls/month free |
|
| 208 |
| Reranker (fallback) | `ms-marco-MiniLM-L-6-v2` | Local cross-encoder, baked into Docker image |
|
|
|
|
| 264 |
# Vector DB (required)
|
| 265 |
QDRANT_URL= # Qdrant Cloud cluster URL
|
| 266 |
QDRANT_API_KEY= # Qdrant Cloud API key
|
| 267 |
+
QDRANT_COLLECTION=github_repos_voyage # new 1024-dim collection for Voyage
|
| 268 |
|
| 269 |
+
# Embeddings
|
| 270 |
+
VOYAGE_API_KEY= # Default β free at voyageai.com
|
| 271 |
+
EMBEDDING_MODEL=voyage-code-3
|
| 272 |
+
EMBEDDING_DIM=1024
|
| 273 |
+
|
| 274 |
+
# Optional embedding fallbacks
|
| 275 |
+
GEMINI_API_KEY= # Also used for LLMs; set EMBEDDING_MODEL=gemini-embedding-001 and EMBEDDING_DIM=768
|
| 276 |
+
NOMIC_API_KEY= # Legacy fallback; set EMBEDDING_MODEL=nomic-embed-text-v1.5 and EMBEDDING_DIM=768
|
| 277 |
|
| 278 |
# LLM (at least one required)
|
| 279 |
CEREBRAS_API_KEY= # Fastest β free at cloud.cerebras.ai (1M tok/day)
|
backend/config.py
CHANGED
|
@@ -25,7 +25,7 @@ class Settings:
|
|
| 25 |
# ββ Vector DB βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 26 |
qdrant_url: str = os.getenv("QDRANT_URL", "")
|
| 27 |
qdrant_api_key: str = os.getenv("QDRANT_API_KEY", "")
|
| 28 |
-
qdrant_collection: str = os.getenv("QDRANT_COLLECTION", "
|
| 29 |
|
| 30 |
# ββ GitHub ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 31 |
# Optional β without it you get 60 API req/hr; with it 5,000 req/hr
|
|
@@ -34,25 +34,25 @@ class Settings:
|
|
| 34 |
# ββ Embeddings ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 35 |
# Three embedding providers, selected at startup by EMBEDDING_MODEL:
|
| 36 |
#
|
| 37 |
-
# 1.
|
| 38 |
-
# gemini-embedding-001: 768-dim output via MRL, generous free tier.
|
| 39 |
-
# Re-uses the same GEMINI_API_KEY used for the LLM β no extra signup.
|
| 40 |
-
# Free at https://aistudio.google.com.
|
| 41 |
-
#
|
| 42 |
-
# 2. Voyage AI (EMBEDDING_MODEL contains "voyage", needs VOYAGE_API_KEY)
|
| 43 |
# voyage-code-3: code-optimised, 1024-dim, 200M tokens/month free.
|
| 44 |
-
#
|
| 45 |
# are incompatible with 768-dim collections.
|
| 46 |
#
|
| 47 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
# nomic-embed-text-v1.5: 768-dim. Free quota is 10M tokens TOTAL
|
| 49 |
# (not per month) β easy to exhaust across a few large indexes.
|
| 50 |
#
|
| 51 |
# EMBEDDING_DIM must match the chosen model exactly.
|
| 52 |
nomic_api_key: str = os.getenv("NOMIC_API_KEY", "")
|
| 53 |
voyage_api_key: str = os.getenv("VOYAGE_API_KEY", "")
|
| 54 |
-
embedding_model: str = os.getenv("EMBEDDING_MODEL", "
|
| 55 |
-
embedding_dim: int = int(os.getenv("EMBEDDING_DIM", "
|
| 56 |
gemini_embedding_batch_size: int = int(os.getenv("GEMINI_EMBEDDING_BATCH_SIZE", "8"))
|
| 57 |
gemini_embedding_min_interval: float = float(os.getenv("GEMINI_EMBEDDING_MIN_INTERVAL", "4.0"))
|
| 58 |
gemini_embedding_retries: int = int(os.getenv("GEMINI_EMBEDDING_RETRIES", "6"))
|
|
|
|
| 25 |
# ββ Vector DB βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 26 |
qdrant_url: str = os.getenv("QDRANT_URL", "")
|
| 27 |
qdrant_api_key: str = os.getenv("QDRANT_API_KEY", "")
|
| 28 |
+
qdrant_collection: str = os.getenv("QDRANT_COLLECTION", "github_repos_voyage")
|
| 29 |
|
| 30 |
# ββ GitHub ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 31 |
# Optional β without it you get 60 API req/hr; with it 5,000 req/hr
|
|
|
|
| 34 |
# ββ Embeddings ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 35 |
# Three embedding providers, selected at startup by EMBEDDING_MODEL:
|
| 36 |
#
|
| 37 |
+
# 1. Voyage AI (default β EMBEDDING_MODEL contains "voyage", needs VOYAGE_API_KEY)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
# voyage-code-3: code-optimised, 1024-dim, 200M tokens/month free.
|
| 39 |
+
# Requires EMBEDDING_DIM=1024 and a NEW Qdrant collection β dims
|
| 40 |
# are incompatible with 768-dim collections.
|
| 41 |
#
|
| 42 |
+
# 2. Gemini (EMBEDDING_MODEL contains "gemini", needs GEMINI_API_KEY)
|
| 43 |
+
# gemini-embedding-001: 768-dim output via MRL. Re-uses the same
|
| 44 |
+
# GEMINI_API_KEY used for the LLM, but free-tier RPM/TPM limits are
|
| 45 |
+
# too tight for LangChain-scale repos.
|
| 46 |
+
#
|
| 47 |
+
# 3. Nomic (legacy fallback β EMBEDDING_MODEL contains "nomic")
|
| 48 |
# nomic-embed-text-v1.5: 768-dim. Free quota is 10M tokens TOTAL
|
| 49 |
# (not per month) β easy to exhaust across a few large indexes.
|
| 50 |
#
|
| 51 |
# EMBEDDING_DIM must match the chosen model exactly.
|
| 52 |
nomic_api_key: str = os.getenv("NOMIC_API_KEY", "")
|
| 53 |
voyage_api_key: str = os.getenv("VOYAGE_API_KEY", "")
|
| 54 |
+
embedding_model: str = os.getenv("EMBEDDING_MODEL", "voyage-code-3")
|
| 55 |
+
embedding_dim: int = int(os.getenv("EMBEDDING_DIM", "1024"))
|
| 56 |
gemini_embedding_batch_size: int = int(os.getenv("GEMINI_EMBEDDING_BATCH_SIZE", "8"))
|
| 57 |
gemini_embedding_min_interval: float = float(os.getenv("GEMINI_EMBEDDING_MIN_INTERVAL", "4.0"))
|
| 58 |
gemini_embedding_retries: int = int(os.getenv("GEMINI_EMBEDDING_RETRIES", "6"))
|
ingestion/embedder.py
CHANGED
|
@@ -12,18 +12,18 @@ THREE PROVIDERS, ONE INTERFACE
|
|
| 12 |
ββββββββββββββββββββββββββββββ
|
| 13 |
Provider is selected from EMBEDDING_MODEL at init:
|
| 14 |
|
| 15 |
-
EMBEDDING_MODEL contains "voyage" + VOYAGE_API_KEY set
|
| 16 |
β Voyage AI: code-optimised, 1024-dim, 200M tokens/month free.
|
| 17 |
voyage-code-3 is specifically trained on code and outperforms
|
| 18 |
general-purpose embedders on code retrieval benchmarks.
|
| 19 |
-
|
| 20 |
|
| 21 |
-
EMBEDDING_MODEL contains "gemini" + GEMINI_API_KEY set
|
| 22 |
β Google Gemini: gemini-embedding-001, 768-dim output (configurable
|
| 23 |
via MRL), generous free tier. Re-uses the same GEMINI_API_KEY we
|
| 24 |
-
use for the LLM
|
| 25 |
|
| 26 |
-
NOMIC_API_KEY set (legacy fallback)
|
| 27 |
β Nomic API: nomic-embed-text-v1.5, 768-dim. Free quota is 10M
|
| 28 |
tokens total β easy to exhaust across a few large repo indexes.
|
| 29 |
|
|
|
|
| 12 |
ββββββββββββββββββββββββββββββ
|
| 13 |
Provider is selected from EMBEDDING_MODEL at init:
|
| 14 |
|
| 15 |
+
EMBEDDING_MODEL contains "voyage" + VOYAGE_API_KEY set (default)
|
| 16 |
β Voyage AI: code-optimised, 1024-dim, 200M tokens/month free.
|
| 17 |
voyage-code-3 is specifically trained on code and outperforms
|
| 18 |
general-purpose embedders on code retrieval benchmarks.
|
| 19 |
+
Requires EMBEDDING_DIM=1024 and a new Qdrant collection.
|
| 20 |
|
| 21 |
+
EMBEDDING_MODEL contains "gemini" + GEMINI_API_KEY set
|
| 22 |
β Google Gemini: gemini-embedding-001, 768-dim output (configurable
|
| 23 |
via MRL), generous free tier. Re-uses the same GEMINI_API_KEY we
|
| 24 |
+
use for the LLM, but free-tier limits are tight for huge repos.
|
| 25 |
|
| 26 |
+
EMBEDDING_MODEL contains "nomic" + NOMIC_API_KEY set (legacy fallback)
|
| 27 |
β Nomic API: nomic-embed-text-v1.5, 768-dim. Free quota is 10M
|
| 28 |
tokens total β easy to exhaust across a few large repo indexes.
|
| 29 |
|
render.yaml
CHANGED
|
@@ -26,16 +26,20 @@ services:
|
|
| 26 |
sync: false # set manually in Render dashboard
|
| 27 |
- key: QDRANT_API_KEY
|
| 28 |
sync: false
|
|
|
|
|
|
|
| 29 |
- key: GROQ_API_KEY
|
| 30 |
sync: false
|
| 31 |
- key: ANTHROPIC_API_KEY
|
| 32 |
sync: false
|
|
|
|
|
|
|
| 33 |
- key: GITHUB_TOKEN
|
| 34 |
sync: false
|
| 35 |
- key: EMBEDDING_MODEL
|
| 36 |
-
value:
|
| 37 |
- key: EMBEDDING_DIM
|
| 38 |
-
value: "
|
| 39 |
- key: TOP_K
|
| 40 |
value: "6"
|
| 41 |
# HuggingFace cache dir β Render gives 1GB ephemeral disk
|
|
|
|
| 26 |
sync: false # set manually in Render dashboard
|
| 27 |
- key: QDRANT_API_KEY
|
| 28 |
sync: false
|
| 29 |
+
- key: QDRANT_COLLECTION
|
| 30 |
+
value: github_repos_voyage
|
| 31 |
- key: GROQ_API_KEY
|
| 32 |
sync: false
|
| 33 |
- key: ANTHROPIC_API_KEY
|
| 34 |
sync: false
|
| 35 |
+
- key: VOYAGE_API_KEY
|
| 36 |
+
sync: false
|
| 37 |
- key: GITHUB_TOKEN
|
| 38 |
sync: false
|
| 39 |
- key: EMBEDDING_MODEL
|
| 40 |
+
value: voyage-code-3
|
| 41 |
- key: EMBEDDING_DIM
|
| 42 |
+
value: "1024"
|
| 43 |
- key: TOP_K
|
| 44 |
value: "6"
|
| 45 |
# HuggingFace cache dir β Render gives 1GB ephemeral disk
|