Spaces:
Running
prebake: always re-ingest so contextual retrieval runs at premium
Browse filesThe previous skip-if-already-indexed short-circuit in scripts/prebake_repos.py
meant contextual retrieval never re-ran for repos that were already in
Qdrant. Result: tour/diagram/README artifacts were premium-quality, but
the underlying chunk vectors kept their original (free-tier or none)
contextual descriptions. Chat retrieval — which depends on contextualised
chunks — silently stayed on the lower tier.
The fix is one line: don't short-circuit. Always pass force=True. The
ingestion service's force=True path runs contextual retrieval enrichment
on every chunk; with premium_mode on, those calls route to claude-sonnet.
Voyage embeddings are deduplicated by content hash, so re-ingestion only
re-embeds changed chunks. Net cost is dominated by the per-chunk
contextual LLM call (premium tier).
Visible effect: every prebaked repo gets the "Contextual retrieval
applied" sparkle in the sidebar afterward. CLAUDE.md updated to reflect
the new behaviour and document the save-site protection from the
previous commit.
- CLAUDE.md +4 -0
- scripts/prebake_repos.py +17 -8
|
@@ -118,6 +118,10 @@ The script flips `gen.premium_mode = True` for the entire run, which:
|
|
| 118 |
- Routes every `gen.generate(...)` call to the Claude Sonnet 4.6 client (`ANTHROPIC_API_KEY` required).
|
| 119 |
- Activates `PREMIUM_CAPS` overrides in `GenerationService` — every `gen.cap(name, default)` call returns the larger premium value (longer ReAct rounds, fuller chunk previews in contextual retrieval, larger README budget, etc.).
|
| 120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
Runtime requests from the deployed app keep the original (smaller) caps so free-tier providers don't drown.
|
| 122 |
|
| 123 |
To inspect what's been baked for a repo:
|
|
|
|
| 118 |
- Routes every `gen.generate(...)` call to the Claude Sonnet 4.6 client (`ANTHROPIC_API_KEY` required).
|
| 119 |
- Activates `PREMIUM_CAPS` overrides in `GenerationService` — every `gen.cap(name, default)` call returns the larger premium value (longer ReAct rounds, fuller chunk previews in contextual retrieval, larger README budget, etc.).
|
| 120 |
|
| 121 |
+
The script always force-re-ingests each repo, even when already indexed, so contextual retrieval re-runs with the premium model. Without this, chat retrieval stays on free-tier (or non-existent) contextual descriptions. The "Contextual retrieval applied" sparkle in the sidebar is the visible proof that this ran.
|
| 122 |
+
|
| 123 |
+
**Save sites refuse to demote premium artifacts.** When the runtime UI's Regenerate button fires a free-tier generation, `_save_tour` / `_save_diagram` / the README cache write detect the existing payload's `generated_by_model` starts with `claude-` and skip the persist with a `[protect] not overwriting premium` log line. The user's session shows their regenerated content (in-memory cache updates) but the durable cache stays at premium quality.
|
| 124 |
+
|
| 125 |
Runtime requests from the deployed app keep the original (smaller) caps so free-tier providers don't drown.
|
| 126 |
|
| 127 |
To inspect what's been baked for a repo:
|
|
@@ -66,17 +66,26 @@ def repo_indexed(store: QdrantStore, repo: str) -> bool:
|
|
| 66 |
|
| 67 |
|
| 68 |
def ingest(repo: str, store: QdrantStore, gen: GenerationService, embedder: Embedder) -> bool:
|
| 69 |
-
"""
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
ingestion = IngestionService(store=store, embedder=embedder, gen=gen)
|
| 75 |
repo_url = f"https://github.com/{repo}"
|
| 76 |
try:
|
| 77 |
-
# force=True
|
| 78 |
-
# is on,
|
| 79 |
-
#
|
|
|
|
| 80 |
last_step = [""]
|
| 81 |
def on_progress(step: str, detail: str) -> None:
|
| 82 |
if step != last_step[0]:
|
|
|
|
| 66 |
|
| 67 |
|
| 68 |
def ingest(repo: str, store: QdrantStore, gen: GenerationService, embedder: Embedder) -> bool:
|
| 69 |
+
"""Re-ingest a repo via GitHub with force=True so contextual retrieval
|
| 70 |
+
runs. Even when the repo is already indexed we re-run — premium prebake
|
| 71 |
+
must end with premium-quality contextual descriptions on every chunk,
|
| 72 |
+
not just whatever the previous (possibly free-tier) ingestion left
|
| 73 |
+
behind. The Voyage embeddings are deduplicated by content hash so this
|
| 74 |
+
isn't as expensive as it sounds: only changed/new chunks pay the
|
| 75 |
+
embed cost; only chunks needing fresh contextual retrieval pay the
|
| 76 |
+
LLM cost."""
|
| 77 |
+
already = repo_indexed(store, repo)
|
| 78 |
+
if already:
|
| 79 |
+
print(f" ▸ re-ingesting {repo} ({store.count(repo=repo)} chunks already indexed)…")
|
| 80 |
+
else:
|
| 81 |
+
print(f" ▸ ingesting {repo}…")
|
| 82 |
ingestion = IngestionService(store=store, embedder=embedder, gen=gen)
|
| 83 |
repo_url = f"https://github.com/{repo}"
|
| 84 |
try:
|
| 85 |
+
# force=True triggers contextual retrieval enrichment. Because
|
| 86 |
+
# premium_mode is on, gen.generate() routes those calls to the
|
| 87 |
+
# premium client → claude-sonnet-4-6. progress callback prints
|
| 88 |
+
# sparse milestones to stdout for visibility.
|
| 89 |
last_step = [""]
|
| 90 |
def on_progress(step: str, detail: str) -> None:
|
| 91 |
if step != last_step[0]:
|