Spaces:

umanggarg
/

cartographer

Sleeping

umanggarg Claude Opus 4.7 commited on 1 day ago

Commit

5f313bc

1 Parent(s): 9f829fa

docs+reliability: README accuracy pass + artifact-cache failure tracking

README:
- Fix tool count: 10 → 12 (add glob and grep to the agent tools table)
- Document the two-tier LLM strategy (free runtime cascade + opt-in
Sonnet 4.6 premium tier for cached artifacts)

diagram_service: track LLM enrichment success and save with model=None
on failure, so the protection rule can replace degraded artifacts later
instead of locking in a partial result.

prebake_repos: collect per-step failures and reflect them in the exit
code + log, so a "successful" prebake actually means every artifact
landed (repo_map, tour, all diagrams, README) — not just ingestion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (3) hide show

README.md +7 -3
backend/services/diagram_service.py +25 -9
scripts/prebake_repos.py +11 -5

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ pinned: false
 **A production-grade RAG system that maps any GitHub repository — built from scratch, without LangChain or LlamaIndex.**
-Index any public repo and ask natural-language questions about its code. Cartographer retrieves the exact functions and classes relevant to your question, explains them with source citations, and can autonomously investigate complex questions across multiple files using an agent with 10 specialised tools. It can also generate a full README for any indexed repo on demand.
 **Live:** [cartographer-app.vercel.app](https://cartographer-app.vercel.app) · **Backend:** [HuggingFace Spaces](https://huggingface.co/spaces/umanggarg/cartographer)
@@ -23,7 +23,7 @@ Most RAG tutorials wrap a library and call it done. This project implements ever
 - **Ingestion** — GitHub API → AST-based code chunking → contextual LLM descriptions → dual-vector embedding → Qdrant Cloud
 - **Retrieval** — HyDE + query expansion + native hybrid search (dense + BM25) + cross-encoder reranking — each stage independently improving recall and precision
-- **Agent** — a ReAct loop with 10 MCP tools, working memory, parallel tool execution, and streaming thought traces
 - **UI** — the pipeline is visible: every retrieved chunk, agent thought, tool call, and confidence grade is shown to the user
 The result is both a useful tool and a study in how production AI systems are actually built.
@@ -88,6 +88,8 @@ Top-8 reranked chunks are injected as numbered sources. The LLM cites by `[1]`,
 Cerebras llama-3.3-70b (2600 tok/s, fastest) → Groq → Gemini → OpenRouter → Anthropic
 ```
 ---
 ## Agent Mode
@@ -100,7 +102,7 @@ The agent communicates with tools via **MCP** — an open protocol for wiring LL
 This means every tool works with any MCP-compatible client, not just our agent.
-### 10 Agent Tools
 | Tool | What it does |
 |------|-------------|
@@ -109,6 +111,8 @@ This means every tool works with any MCP-compatible client, not just our agent.
 | `read_file` | Read any indexed file in full |
 | `get_file_chunk` | Read a precise line range from a file |
 | `list_files` | List all indexed files in a repo or subdirectory |
 | `find_callers` | Find every call site of a function across the repo |
 | `trace_calls` | Walk the call chain from a function to see what it calls and what calls it |
 | `note` | Store a key-value fact in working memory for this session |

 **A production-grade RAG system that maps any GitHub repository — built from scratch, without LangChain or LlamaIndex.**
+Index any public repo and ask natural-language questions about its code. Cartographer retrieves the exact functions and classes relevant to your question, explains them with source citations, and can autonomously investigate complex questions across multiple files using an agent with 12 specialised tools. It can also generate a full README for any indexed repo on demand.
 **Live:** [cartographer-app.vercel.app](https://cartographer-app.vercel.app) · **Backend:** [HuggingFace Spaces](https://huggingface.co/spaces/umanggarg/cartographer)
 - **Ingestion** — GitHub API → AST-based code chunking → contextual LLM descriptions → dual-vector embedding → Qdrant Cloud
 - **Retrieval** — HyDE + query expansion + native hybrid search (dense + BM25) + cross-encoder reranking — each stage independently improving recall and precision
+- **Agent** — a ReAct loop with 12 MCP tools, working memory, parallel tool execution, and streaming thought traces
 - **UI** — the pipeline is visible: every retrieved chunk, agent thought, tool call, and confidence grade is shown to the user
 The result is both a useful tool and a study in how production AI systems are actually built.
 Cerebras llama-3.3-70b (2600 tok/s, fastest) → Groq → Gemini → OpenRouter → Anthropic
 ```
+**Two-tier LLM strategy.** The free cascade above serves all runtime traffic — Q&A, agent mode, diagrams. A second, opt-in **premium tier** uses Claude Sonnet 4.6 to generate cached artifacts (concept tour, diagrams, README, repo_map) once per repo. Outputs are persisted in a `_artifacts` Qdrant collection and survive container restarts, so subsequent visitors get the high-quality artifacts at the free-cascade cost.
 ---
 ## Agent Mode
 This means every tool works with any MCP-compatible client, not just our agent.
+### 12 Agent Tools
 | Tool | What it does |
 |------|-------------|
 | `read_file` | Read any indexed file in full |
 | `get_file_chunk` | Read a precise line range from a file |
 | `list_files` | List all indexed files in a repo or subdirectory |
+| `glob` | Find files matching a glob pattern across the indexed repo |
+| `grep` | Regex search across indexed file contents |
 | `find_callers` | Find every call site of a function across the repo |
 | `trace_calls` | Walk the call chain from a function to see what it calls and what calls it |
 | `note` | Store a key-value fact in working memory for this session |

backend/services/diagram_service.py CHANGED Viewed

@@ -474,18 +474,23 @@ class DiagramService:
             # Architecture, Class, Data Flow: ground-truth structure from AST,
             # LLM only writes node descriptions.
             data = self._build_static_graph(repo, diagram_type)
             if not data or not data.get("nodes"):
                 # Fallback to LLM if static analysis yields nothing
                 # (e.g. non-Python repo with no AST data)
                 data = self._build_diagram_from_llm(repo, diagram_type, chunks)
             else:
-                data = self._enrich_nodes(repo, diagram_type, data, chunks)
         if not data:
             return {"error": "Could not generate diagram. Try regenerating."}
         self._cache[cache_key] = data
-        self._save_diagram(repo, diagram_type, data, model=self._gen.current_model())
         return {"diagram": data, "type": diagram_type}
     def build_tour(self, repo: str) -> dict:
@@ -689,6 +694,7 @@ class DiagramService:
             return
         yield {"stage": "building", "progress": 0.40, "message": "Building graph from AST…"}
         if diagram_type == "sequence":
             data = self._build_sequence_from_llm(repo, chunks)
         else:
@@ -700,7 +706,7 @@ class DiagramService:
             else:
                 yield {"stage": "enriching", "progress": 0.70,
                        "message": "Enriching node descriptions with AI…"}
-                data = self._enrich_nodes(repo, diagram_type, data, chunks)
         if not data:
             yield {"stage": "error", "progress": 1.0,
@@ -708,7 +714,11 @@ class DiagramService:
             return
         self._cache[cache_key] = data
-        self._save_diagram(repo, diagram_type, data, model=self._gen.current_model())
         yield {"stage": "done", "progress": 1.0, "diagram": data, "type": diagram_type}
     def invalidate(self, repo: str):
@@ -1022,7 +1032,7 @@ class DiagramService:
         return {"nodes": nodes, "edges": edges}
-    def _enrich_nodes(self, repo: str, diagram_type: str, graph: dict, chunks: list[dict] | None = None) -> dict:
         """
         Ask the LLM to write a short description for each node.
@@ -1034,12 +1044,17 @@ class DiagramService:
         snippets per node. Without this, the LLM only sees the node name and file
         and has to guess what the component does — the most common source of
         inaccurate descriptions. With snippets, descriptions are grounded in real code.
         """
         import json as _json
         nodes = graph.get("nodes", [])
         if not nodes:
-            return graph
         # Two lookups so we can attach real code for both diagram types:
         #
@@ -1116,11 +1131,12 @@ class DiagramService:
                     n["description"] = desc
             matched = sum(1 for n in nodes if n.get("description"))
             print(f"DiagramService: enriched {matched}/{len(nodes)} nodes with descriptions")
         except Exception as e:
             print(f"DiagramService: enrichment failed (non-fatal): {e}")
-            # Descriptions stay empty — diagram still shows accurate structure
-        return graph
     # ── LLM-based builders ────────────────────────────────────────────────────

             # Architecture, Class, Data Flow: ground-truth structure from AST,
             # LLM only writes node descriptions.
             data = self._build_static_graph(repo, diagram_type)
+            enriched_ok = True  # LLM-built path: success implied by non-empty data
             if not data or not data.get("nodes"):
                 # Fallback to LLM if static analysis yields nothing
                 # (e.g. non-Python repo with no AST data)
                 data = self._build_diagram_from_llm(repo, diagram_type, chunks)
             else:
+                data, enriched_ok = self._enrich_nodes(repo, diagram_type, data, chunks)
         if not data:
             return {"error": "Could not generate diagram. Try regenerating."}
         self._cache[cache_key] = data
+        # Only label the save with the active model if the LLM step actually
+        # succeeded. On enrichment failure, save with model=None so the
+        # protection rule can replace this degraded artifact later.
+        save_model = self._gen.current_model() if enriched_ok else None
+        self._save_diagram(repo, diagram_type, data, model=save_model)
         return {"diagram": data, "type": diagram_type}
     def build_tour(self, repo: str) -> dict:
             return
         yield {"stage": "building", "progress": 0.40, "message": "Building graph from AST…"}
+        enriched_ok = True
         if diagram_type == "sequence":
             data = self._build_sequence_from_llm(repo, chunks)
         else:
             else:
                 yield {"stage": "enriching", "progress": 0.70,
                        "message": "Enriching node descriptions with AI…"}
+                data, enriched_ok = self._enrich_nodes(repo, diagram_type, data, chunks)
         if not data:
             yield {"stage": "error", "progress": 1.0,
             return
         self._cache[cache_key] = data
+        # See build_diagram() for rationale: skip premium label if the LLM
+        # enrichment call silently failed, otherwise the protection rule
+        # treats a structurally-only-correct diagram as premium-quality.
+        save_model = self._gen.current_model() if enriched_ok else None
+        self._save_diagram(repo, diagram_type, data, model=save_model)
         yield {"stage": "done", "progress": 1.0, "diagram": data, "type": diagram_type}
     def invalidate(self, repo: str):
         return {"nodes": nodes, "edges": edges}
+    def _enrich_nodes(self, repo: str, diagram_type: str, graph: dict, chunks: list[dict] | None = None) -> tuple[dict, bool]:
         """
         Ask the LLM to write a short description for each node.
         snippets per node. Without this, the LLM only sees the node name and file
         and has to guess what the component does — the most common source of
         inaccurate descriptions. With snippets, descriptions are grounded in real code.
+        Returns (graph, enriched_ok). enriched_ok=False means the LLM call failed
+        and descriptions are missing; callers should not label the save with the
+        configured premium model in that case (otherwise the protection rule trusts
+        a degraded artifact as if it were premium-quality).
         """
         import json as _json
         nodes = graph.get("nodes", [])
         if not nodes:
+            return graph, True
         # Two lookups so we can attach real code for both diagram types:
         #
                     n["description"] = desc
             matched = sum(1 for n in nodes if n.get("description"))
             print(f"DiagramService: enriched {matched}/{len(nodes)} nodes with descriptions")
+            return graph, True
         except Exception as e:
             print(f"DiagramService: enrichment failed (non-fatal): {e}")
+            # Descriptions stay empty — diagram still shows accurate structure,
+            # but signal failure so the save site doesn't tag this as premium.
+            return graph, False
     # ── LLM-based builders ────────────────────────────────────────────────────

scripts/prebake_repos.py CHANGED Viewed

@@ -184,14 +184,20 @@ def bake_one(
     started = time.monotonic()
     if not ingest(repo, store, gen, embedder):
         return False
-    bake_repo_map(repo, repo_map_svc, force)
-    bake_tour(repo, diagram_svc, force)
     for dtype in DIAGRAM_TYPES:
-        bake_diagram(repo, dtype, diagram_svc, force)
-    bake_readme(repo, readme_svc, store, force)
     elapsed = time.monotonic() - started
     print(f"  ⏱  {elapsed:.1f}s")
-    return True
 def main() -> int:

     started = time.monotonic()
     if not ingest(repo, store, gen, embedder):
         return False
+    # Each bake step returns False on failure. Track them all so the
+    # final exit code reflects whether the repo is *actually* fully
+    # baked, not just whether ingestion succeeded.
+    failures: list[str] = []
+    if not bake_repo_map(repo, repo_map_svc, force):                 failures.append("repo_map")
+    if not bake_tour(repo, diagram_svc, force):                      failures.append("tour")
     for dtype in DIAGRAM_TYPES:
+        if not bake_diagram(repo, dtype, diagram_svc, force):        failures.append(f"diagram:{dtype}")
+    if not bake_readme(repo, readme_svc, store, force):              failures.append("readme")
     elapsed = time.monotonic() - started
+    if failures:
+        print(f"  ⚠  partial: {len(failures)} step(s) failed → {', '.join(failures)}")
     print(f"  ⏱  {elapsed:.1f}s")
+    return not failures
 def main() -> int: