umanggarg Claude Opus 4.7 commited on
Commit
5f313bc
Β·
1 Parent(s): 9f829fa

docs+reliability: README accuracy pass + artifact-cache failure tracking

Browse files

README:
- Fix tool count: 10 β†’ 12 (add glob and grep to the agent tools table)
- Document the two-tier LLM strategy (free runtime cascade + opt-in
Sonnet 4.6 premium tier for cached artifacts)

diagram_service: track LLM enrichment success and save with model=None
on failure, so the protection rule can replace degraded artifacts later
instead of locking in a partial result.

prebake_repos: collect per-step failures and reflect them in the exit
code + log, so a "successful" prebake actually means every artifact
landed (repo_map, tour, all diagrams, README) β€” not just ingestion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

README.md CHANGED
@@ -11,7 +11,7 @@ pinned: false
11
 
12
  **A production-grade RAG system that maps any GitHub repository β€” built from scratch, without LangChain or LlamaIndex.**
13
 
14
- Index any public repo and ask natural-language questions about its code. Cartographer retrieves the exact functions and classes relevant to your question, explains them with source citations, and can autonomously investigate complex questions across multiple files using an agent with 10 specialised tools. It can also generate a full README for any indexed repo on demand.
15
 
16
  **Live:** [cartographer-app.vercel.app](https://cartographer-app.vercel.app) Β· **Backend:** [HuggingFace Spaces](https://huggingface.co/spaces/umanggarg/cartographer)
17
 
@@ -23,7 +23,7 @@ Most RAG tutorials wrap a library and call it done. This project implements ever
23
 
24
  - **Ingestion** β€” GitHub API β†’ AST-based code chunking β†’ contextual LLM descriptions β†’ dual-vector embedding β†’ Qdrant Cloud
25
  - **Retrieval** β€” HyDE + query expansion + native hybrid search (dense + BM25) + cross-encoder reranking β€” each stage independently improving recall and precision
26
- - **Agent** β€” a ReAct loop with 10 MCP tools, working memory, parallel tool execution, and streaming thought traces
27
  - **UI** β€” the pipeline is visible: every retrieved chunk, agent thought, tool call, and confidence grade is shown to the user
28
 
29
  The result is both a useful tool and a study in how production AI systems are actually built.
@@ -88,6 +88,8 @@ Top-8 reranked chunks are injected as numbered sources. The LLM cites by `[1]`,
88
  Cerebras llama-3.3-70b (2600 tok/s, fastest) β†’ Groq β†’ Gemini β†’ OpenRouter β†’ Anthropic
89
  ```
90
 
 
 
91
  ---
92
 
93
  ## Agent Mode
@@ -100,7 +102,7 @@ The agent communicates with tools via **MCP** β€” an open protocol for wiring LL
100
 
101
  This means every tool works with any MCP-compatible client, not just our agent.
102
 
103
- ### 10 Agent Tools
104
 
105
  | Tool | What it does |
106
  |------|-------------|
@@ -109,6 +111,8 @@ This means every tool works with any MCP-compatible client, not just our agent.
109
  | `read_file` | Read any indexed file in full |
110
  | `get_file_chunk` | Read a precise line range from a file |
111
  | `list_files` | List all indexed files in a repo or subdirectory |
 
 
112
  | `find_callers` | Find every call site of a function across the repo |
113
  | `trace_calls` | Walk the call chain from a function to see what it calls and what calls it |
114
  | `note` | Store a key-value fact in working memory for this session |
 
11
 
12
  **A production-grade RAG system that maps any GitHub repository β€” built from scratch, without LangChain or LlamaIndex.**
13
 
14
+ Index any public repo and ask natural-language questions about its code. Cartographer retrieves the exact functions and classes relevant to your question, explains them with source citations, and can autonomously investigate complex questions across multiple files using an agent with 12 specialised tools. It can also generate a full README for any indexed repo on demand.
15
 
16
  **Live:** [cartographer-app.vercel.app](https://cartographer-app.vercel.app) Β· **Backend:** [HuggingFace Spaces](https://huggingface.co/spaces/umanggarg/cartographer)
17
 
 
23
 
24
  - **Ingestion** β€” GitHub API β†’ AST-based code chunking β†’ contextual LLM descriptions β†’ dual-vector embedding β†’ Qdrant Cloud
25
  - **Retrieval** β€” HyDE + query expansion + native hybrid search (dense + BM25) + cross-encoder reranking β€” each stage independently improving recall and precision
26
+ - **Agent** β€” a ReAct loop with 12 MCP tools, working memory, parallel tool execution, and streaming thought traces
27
  - **UI** β€” the pipeline is visible: every retrieved chunk, agent thought, tool call, and confidence grade is shown to the user
28
 
29
  The result is both a useful tool and a study in how production AI systems are actually built.
 
88
  Cerebras llama-3.3-70b (2600 tok/s, fastest) β†’ Groq β†’ Gemini β†’ OpenRouter β†’ Anthropic
89
  ```
90
 
91
+ **Two-tier LLM strategy.** The free cascade above serves all runtime traffic β€” Q&A, agent mode, diagrams. A second, opt-in **premium tier** uses Claude Sonnet 4.6 to generate cached artifacts (concept tour, diagrams, README, repo_map) once per repo. Outputs are persisted in a `_artifacts` Qdrant collection and survive container restarts, so subsequent visitors get the high-quality artifacts at the free-cascade cost.
92
+
93
  ---
94
 
95
  ## Agent Mode
 
102
 
103
  This means every tool works with any MCP-compatible client, not just our agent.
104
 
105
+ ### 12 Agent Tools
106
 
107
  | Tool | What it does |
108
  |------|-------------|
 
111
  | `read_file` | Read any indexed file in full |
112
  | `get_file_chunk` | Read a precise line range from a file |
113
  | `list_files` | List all indexed files in a repo or subdirectory |
114
+ | `glob` | Find files matching a glob pattern across the indexed repo |
115
+ | `grep` | Regex search across indexed file contents |
116
  | `find_callers` | Find every call site of a function across the repo |
117
  | `trace_calls` | Walk the call chain from a function to see what it calls and what calls it |
118
  | `note` | Store a key-value fact in working memory for this session |
backend/services/diagram_service.py CHANGED
@@ -474,18 +474,23 @@ class DiagramService:
474
  # Architecture, Class, Data Flow: ground-truth structure from AST,
475
  # LLM only writes node descriptions.
476
  data = self._build_static_graph(repo, diagram_type)
 
477
  if not data or not data.get("nodes"):
478
  # Fallback to LLM if static analysis yields nothing
479
  # (e.g. non-Python repo with no AST data)
480
  data = self._build_diagram_from_llm(repo, diagram_type, chunks)
481
  else:
482
- data = self._enrich_nodes(repo, diagram_type, data, chunks)
483
 
484
  if not data:
485
  return {"error": "Could not generate diagram. Try regenerating."}
486
 
487
  self._cache[cache_key] = data
488
- self._save_diagram(repo, diagram_type, data, model=self._gen.current_model())
 
 
 
 
489
  return {"diagram": data, "type": diagram_type}
490
 
491
  def build_tour(self, repo: str) -> dict:
@@ -689,6 +694,7 @@ class DiagramService:
689
  return
690
 
691
  yield {"stage": "building", "progress": 0.40, "message": "Building graph from AST…"}
 
692
  if diagram_type == "sequence":
693
  data = self._build_sequence_from_llm(repo, chunks)
694
  else:
@@ -700,7 +706,7 @@ class DiagramService:
700
  else:
701
  yield {"stage": "enriching", "progress": 0.70,
702
  "message": "Enriching node descriptions with AI…"}
703
- data = self._enrich_nodes(repo, diagram_type, data, chunks)
704
 
705
  if not data:
706
  yield {"stage": "error", "progress": 1.0,
@@ -708,7 +714,11 @@ class DiagramService:
708
  return
709
 
710
  self._cache[cache_key] = data
711
- self._save_diagram(repo, diagram_type, data, model=self._gen.current_model())
 
 
 
 
712
  yield {"stage": "done", "progress": 1.0, "diagram": data, "type": diagram_type}
713
 
714
  def invalidate(self, repo: str):
@@ -1022,7 +1032,7 @@ class DiagramService:
1022
 
1023
  return {"nodes": nodes, "edges": edges}
1024
 
1025
- def _enrich_nodes(self, repo: str, diagram_type: str, graph: dict, chunks: list[dict] | None = None) -> dict:
1026
  """
1027
  Ask the LLM to write a short description for each node.
1028
 
@@ -1034,12 +1044,17 @@ class DiagramService:
1034
  snippets per node. Without this, the LLM only sees the node name and file
1035
  and has to guess what the component does β€” the most common source of
1036
  inaccurate descriptions. With snippets, descriptions are grounded in real code.
 
 
 
 
 
1037
  """
1038
  import json as _json
1039
 
1040
  nodes = graph.get("nodes", [])
1041
  if not nodes:
1042
- return graph
1043
 
1044
  # Two lookups so we can attach real code for both diagram types:
1045
  #
@@ -1116,11 +1131,12 @@ class DiagramService:
1116
  n["description"] = desc
1117
  matched = sum(1 for n in nodes if n.get("description"))
1118
  print(f"DiagramService: enriched {matched}/{len(nodes)} nodes with descriptions")
 
1119
  except Exception as e:
1120
  print(f"DiagramService: enrichment failed (non-fatal): {e}")
1121
- # Descriptions stay empty β€” diagram still shows accurate structure
1122
-
1123
- return graph
1124
 
1125
  # ── LLM-based builders ────────────────────────────────────────────────────
1126
 
 
474
  # Architecture, Class, Data Flow: ground-truth structure from AST,
475
  # LLM only writes node descriptions.
476
  data = self._build_static_graph(repo, diagram_type)
477
+ enriched_ok = True # LLM-built path: success implied by non-empty data
478
  if not data or not data.get("nodes"):
479
  # Fallback to LLM if static analysis yields nothing
480
  # (e.g. non-Python repo with no AST data)
481
  data = self._build_diagram_from_llm(repo, diagram_type, chunks)
482
  else:
483
+ data, enriched_ok = self._enrich_nodes(repo, diagram_type, data, chunks)
484
 
485
  if not data:
486
  return {"error": "Could not generate diagram. Try regenerating."}
487
 
488
  self._cache[cache_key] = data
489
+ # Only label the save with the active model if the LLM step actually
490
+ # succeeded. On enrichment failure, save with model=None so the
491
+ # protection rule can replace this degraded artifact later.
492
+ save_model = self._gen.current_model() if enriched_ok else None
493
+ self._save_diagram(repo, diagram_type, data, model=save_model)
494
  return {"diagram": data, "type": diagram_type}
495
 
496
  def build_tour(self, repo: str) -> dict:
 
694
  return
695
 
696
  yield {"stage": "building", "progress": 0.40, "message": "Building graph from AST…"}
697
+ enriched_ok = True
698
  if diagram_type == "sequence":
699
  data = self._build_sequence_from_llm(repo, chunks)
700
  else:
 
706
  else:
707
  yield {"stage": "enriching", "progress": 0.70,
708
  "message": "Enriching node descriptions with AI…"}
709
+ data, enriched_ok = self._enrich_nodes(repo, diagram_type, data, chunks)
710
 
711
  if not data:
712
  yield {"stage": "error", "progress": 1.0,
 
714
  return
715
 
716
  self._cache[cache_key] = data
717
+ # See build_diagram() for rationale: skip premium label if the LLM
718
+ # enrichment call silently failed, otherwise the protection rule
719
+ # treats a structurally-only-correct diagram as premium-quality.
720
+ save_model = self._gen.current_model() if enriched_ok else None
721
+ self._save_diagram(repo, diagram_type, data, model=save_model)
722
  yield {"stage": "done", "progress": 1.0, "diagram": data, "type": diagram_type}
723
 
724
  def invalidate(self, repo: str):
 
1032
 
1033
  return {"nodes": nodes, "edges": edges}
1034
 
1035
+ def _enrich_nodes(self, repo: str, diagram_type: str, graph: dict, chunks: list[dict] | None = None) -> tuple[dict, bool]:
1036
  """
1037
  Ask the LLM to write a short description for each node.
1038
 
 
1044
  snippets per node. Without this, the LLM only sees the node name and file
1045
  and has to guess what the component does β€” the most common source of
1046
  inaccurate descriptions. With snippets, descriptions are grounded in real code.
1047
+
1048
+ Returns (graph, enriched_ok). enriched_ok=False means the LLM call failed
1049
+ and descriptions are missing; callers should not label the save with the
1050
+ configured premium model in that case (otherwise the protection rule trusts
1051
+ a degraded artifact as if it were premium-quality).
1052
  """
1053
  import json as _json
1054
 
1055
  nodes = graph.get("nodes", [])
1056
  if not nodes:
1057
+ return graph, True
1058
 
1059
  # Two lookups so we can attach real code for both diagram types:
1060
  #
 
1131
  n["description"] = desc
1132
  matched = sum(1 for n in nodes if n.get("description"))
1133
  print(f"DiagramService: enriched {matched}/{len(nodes)} nodes with descriptions")
1134
+ return graph, True
1135
  except Exception as e:
1136
  print(f"DiagramService: enrichment failed (non-fatal): {e}")
1137
+ # Descriptions stay empty β€” diagram still shows accurate structure,
1138
+ # but signal failure so the save site doesn't tag this as premium.
1139
+ return graph, False
1140
 
1141
  # ── LLM-based builders ────────────────────────────────────────────────────
1142
 
scripts/prebake_repos.py CHANGED
@@ -184,14 +184,20 @@ def bake_one(
184
  started = time.monotonic()
185
  if not ingest(repo, store, gen, embedder):
186
  return False
187
- bake_repo_map(repo, repo_map_svc, force)
188
- bake_tour(repo, diagram_svc, force)
 
 
 
 
189
  for dtype in DIAGRAM_TYPES:
190
- bake_diagram(repo, dtype, diagram_svc, force)
191
- bake_readme(repo, readme_svc, store, force)
192
  elapsed = time.monotonic() - started
 
 
193
  print(f" ⏱ {elapsed:.1f}s")
194
- return True
195
 
196
 
197
  def main() -> int:
 
184
  started = time.monotonic()
185
  if not ingest(repo, store, gen, embedder):
186
  return False
187
+ # Each bake step returns False on failure. Track them all so the
188
+ # final exit code reflects whether the repo is *actually* fully
189
+ # baked, not just whether ingestion succeeded.
190
+ failures: list[str] = []
191
+ if not bake_repo_map(repo, repo_map_svc, force): failures.append("repo_map")
192
+ if not bake_tour(repo, diagram_svc, force): failures.append("tour")
193
  for dtype in DIAGRAM_TYPES:
194
+ if not bake_diagram(repo, dtype, diagram_svc, force): failures.append(f"diagram:{dtype}")
195
+ if not bake_readme(repo, readme_svc, store, force): failures.append("readme")
196
  elapsed = time.monotonic() - started
197
+ if failures:
198
+ print(f" ⚠ partial: {len(failures)} step(s) failed β†’ {', '.join(failures)}")
199
  print(f" ⏱ {elapsed:.1f}s")
200
+ return not failures
201
 
202
 
203
  def main() -> int: