umanggarg commited on
Commit
9f829fa
·
1 Parent(s): 5c77349

prebake: always re-ingest so contextual retrieval runs at premium

Browse files

The previous skip-if-already-indexed short-circuit in scripts/prebake_repos.py
meant contextual retrieval never re-ran for repos that were already in
Qdrant. Result: tour/diagram/README artifacts were premium-quality, but
the underlying chunk vectors kept their original (free-tier or none)
contextual descriptions. Chat retrieval — which depends on contextualised
chunks — silently stayed on the lower tier.

The fix is one line: don't short-circuit. Always pass force=True. The
ingestion service's force=True path runs contextual retrieval enrichment
on every chunk; with premium_mode on, those calls route to claude-sonnet.

Voyage embeddings are deduplicated by content hash, so re-ingestion only
re-embeds changed chunks. Net cost is dominated by the per-chunk
contextual LLM call (premium tier).

Visible effect: every prebaked repo gets the "Contextual retrieval
applied" sparkle in the sidebar afterward. CLAUDE.md updated to reflect
the new behaviour and document the save-site protection from the
previous commit.

Files changed (2) hide show
  1. CLAUDE.md +4 -0
  2. scripts/prebake_repos.py +17 -8
CLAUDE.md CHANGED
@@ -118,6 +118,10 @@ The script flips `gen.premium_mode = True` for the entire run, which:
118
  - Routes every `gen.generate(...)` call to the Claude Sonnet 4.6 client (`ANTHROPIC_API_KEY` required).
119
  - Activates `PREMIUM_CAPS` overrides in `GenerationService` — every `gen.cap(name, default)` call returns the larger premium value (longer ReAct rounds, fuller chunk previews in contextual retrieval, larger README budget, etc.).
120
 
 
 
 
 
121
  Runtime requests from the deployed app keep the original (smaller) caps so free-tier providers don't drown.
122
 
123
  To inspect what's been baked for a repo:
 
118
  - Routes every `gen.generate(...)` call to the Claude Sonnet 4.6 client (`ANTHROPIC_API_KEY` required).
119
  - Activates `PREMIUM_CAPS` overrides in `GenerationService` — every `gen.cap(name, default)` call returns the larger premium value (longer ReAct rounds, fuller chunk previews in contextual retrieval, larger README budget, etc.).
120
 
121
+ The script always force-re-ingests each repo, even when already indexed, so contextual retrieval re-runs with the premium model. Without this, chat retrieval stays on free-tier (or non-existent) contextual descriptions. The "Contextual retrieval applied" sparkle in the sidebar is the visible proof that this ran.
122
+
123
+ **Save sites refuse to demote premium artifacts.** When the runtime UI's Regenerate button fires a free-tier generation, `_save_tour` / `_save_diagram` / the README cache write detect the existing payload's `generated_by_model` starts with `claude-` and skip the persist with a `[protect] not overwriting premium` log line. The user's session shows their regenerated content (in-memory cache updates) but the durable cache stays at premium quality.
124
+
125
  Runtime requests from the deployed app keep the original (smaller) caps so free-tier providers don't drown.
126
 
127
  To inspect what's been baked for a repo:
scripts/prebake_repos.py CHANGED
@@ -66,17 +66,26 @@ def repo_indexed(store: QdrantStore, repo: str) -> bool:
66
 
67
 
68
  def ingest(repo: str, store: QdrantStore, gen: GenerationService, embedder: Embedder) -> bool:
69
- """Ingest a repo via GitHub. Skips if already indexed."""
70
- if repo_indexed(store, repo):
71
- print(f" ✓ already indexed ({store.count(repo=repo)} chunks) skipping ingestion")
72
- return True
73
- print(f" ▸ ingesting {repo}…")
 
 
 
 
 
 
 
 
74
  ingestion = IngestionService(store=store, embedder=embedder, gen=gen)
75
  repo_url = f"https://github.com/{repo}"
76
  try:
77
- # force=True so contextual retrieval enrichment runs (premium_mode
78
- # is on, so the per-chunk descriptions are generated by the premium
79
- # model). progress callback prints sparse milestones to stdout.
 
80
  last_step = [""]
81
  def on_progress(step: str, detail: str) -> None:
82
  if step != last_step[0]:
 
66
 
67
 
68
  def ingest(repo: str, store: QdrantStore, gen: GenerationService, embedder: Embedder) -> bool:
69
+ """Re-ingest a repo via GitHub with force=True so contextual retrieval
70
+ runs. Even when the repo is already indexed we re-run — premium prebake
71
+ must end with premium-quality contextual descriptions on every chunk,
72
+ not just whatever the previous (possibly free-tier) ingestion left
73
+ behind. The Voyage embeddings are deduplicated by content hash so this
74
+ isn't as expensive as it sounds: only changed/new chunks pay the
75
+ embed cost; only chunks needing fresh contextual retrieval pay the
76
+ LLM cost."""
77
+ already = repo_indexed(store, repo)
78
+ if already:
79
+ print(f" ▸ re-ingesting {repo} ({store.count(repo=repo)} chunks already indexed)…")
80
+ else:
81
+ print(f" ▸ ingesting {repo}…")
82
  ingestion = IngestionService(store=store, embedder=embedder, gen=gen)
83
  repo_url = f"https://github.com/{repo}"
84
  try:
85
+ # force=True triggers contextual retrieval enrichment. Because
86
+ # premium_mode is on, gen.generate() routes those calls to the
87
+ # premium client → claude-sonnet-4-6. progress callback prints
88
+ # sparse milestones to stdout for visibility.
89
  last_step = [""]
90
  def on_progress(step: str, detail: str) -> None:
91
  if step != last_step[0]: