Spaces:

umanggarg
/

cartographer

Running

umanggarg commited on 11 days ago

Commit

9f829fa

1 Parent(s): 5c77349

prebake: always re-ingest so contextual retrieval runs at premium

The previous skip-if-already-indexed short-circuit in scripts/prebake_repos.py
meant contextual retrieval never re-ran for repos that were already in
Qdrant. Result: tour/diagram/README artifacts were premium-quality, but
the underlying chunk vectors kept their original (free-tier or none)
contextual descriptions. Chat retrieval — which depends on contextualised
chunks — silently stayed on the lower tier.

The fix is one line: don't short-circuit. Always pass force=True. The
ingestion service's force=True path runs contextual retrieval enrichment
on every chunk; with premium_mode on, those calls route to claude-sonnet.

Voyage embeddings are deduplicated by content hash, so re-ingestion only
re-embeds changed chunks. Net cost is dominated by the per-chunk
contextual LLM call (premium tier).

Visible effect: every prebaked repo gets the "Contextual retrieval
applied" sparkle in the sidebar afterward. CLAUDE.md updated to reflect
the new behaviour and document the save-site protection from the
previous commit.

Files changed (2) hide show

CLAUDE.md +4 -0
scripts/prebake_repos.py +17 -8

CLAUDE.md CHANGED Viewed

@@ -118,6 +118,10 @@ The script flips `gen.premium_mode = True` for the entire run, which:
 - Routes every `gen.generate(...)` call to the Claude Sonnet 4.6 client (`ANTHROPIC_API_KEY` required).
 - Activates `PREMIUM_CAPS` overrides in `GenerationService` — every `gen.cap(name, default)` call returns the larger premium value (longer ReAct rounds, fuller chunk previews in contextual retrieval, larger README budget, etc.).
 Runtime requests from the deployed app keep the original (smaller) caps so free-tier providers don't drown.
 To inspect what's been baked for a repo:

 - Routes every `gen.generate(...)` call to the Claude Sonnet 4.6 client (`ANTHROPIC_API_KEY` required).
 - Activates `PREMIUM_CAPS` overrides in `GenerationService` — every `gen.cap(name, default)` call returns the larger premium value (longer ReAct rounds, fuller chunk previews in contextual retrieval, larger README budget, etc.).
+The script always force-re-ingests each repo, even when already indexed, so contextual retrieval re-runs with the premium model. Without this, chat retrieval stays on free-tier (or non-existent) contextual descriptions. The "Contextual retrieval applied" sparkle in the sidebar is the visible proof that this ran.
+**Save sites refuse to demote premium artifacts.** When the runtime UI's Regenerate button fires a free-tier generation, `_save_tour` / `_save_diagram` / the README cache write detect the existing payload's `generated_by_model` starts with `claude-` and skip the persist with a `[protect] not overwriting premium` log line. The user's session shows their regenerated content (in-memory cache updates) but the durable cache stays at premium quality.
 Runtime requests from the deployed app keep the original (smaller) caps so free-tier providers don't drown.
 To inspect what's been baked for a repo:

scripts/prebake_repos.py CHANGED Viewed

@@ -66,17 +66,26 @@ def repo_indexed(store: QdrantStore, repo: str) -> bool:
 def ingest(repo: str, store: QdrantStore, gen: GenerationService, embedder: Embedder) -> bool:
-    """Ingest a repo via GitHub. Skips if already indexed."""
-    if repo_indexed(store, repo):
-        print(f"  ✓ already indexed ({store.count(repo=repo)} chunks) — skipping ingestion")
-        return True
-    print(f"  ▸ ingesting {repo}…")
     ingestion = IngestionService(store=store, embedder=embedder, gen=gen)
     repo_url = f"https://github.com/{repo}"
     try:
-        # force=True so contextual retrieval enrichment runs (premium_mode
-        # is on, so the per-chunk descriptions are generated by the premium
-        # model). progress callback prints sparse milestones to stdout.
         last_step = [""]
         def on_progress(step: str, detail: str) -> None:
             if step != last_step[0]:

 def ingest(repo: str, store: QdrantStore, gen: GenerationService, embedder: Embedder) -> bool:
+    """Re-ingest a repo via GitHub with force=True so contextual retrieval
+    runs. Even when the repo is already indexed we re-run — premium prebake
+    must end with premium-quality contextual descriptions on every chunk,
+    not just whatever the previous (possibly free-tier) ingestion left
+    behind. The Voyage embeddings are deduplicated by content hash so this
+    isn't as expensive as it sounds: only changed/new chunks pay the
+    embed cost; only chunks needing fresh contextual retrieval pay the
+    LLM cost."""
+    already = repo_indexed(store, repo)
+    if already:
+        print(f"  ▸ re-ingesting {repo} ({store.count(repo=repo)} chunks already indexed)…")
+    else:
+        print(f"  ▸ ingesting {repo}…")
     ingestion = IngestionService(store=store, embedder=embedder, gen=gen)
     repo_url = f"https://github.com/{repo}"
     try:
+        # force=True triggers contextual retrieval enrichment. Because
+        # premium_mode is on, gen.generate() routes those calls to the
+        # premium client → claude-sonnet-4-6. progress callback prints
+        # sparse milestones to stdout for visibility.
         last_step = [""]
         def on_progress(step: str, detail: str) -> None:
             if step != last_step[0]: