umanggarg Claude Sonnet 4.6 commited on
Commit
af2ddfc
Β·
1 Parent(s): 2191371

feat: agentic Phase 1 with real MCP tools + ReAct trace in UI

Browse files

Phase 1 (Map) is now a ReAct loop instead of a one-shot static snapshot:
- 5 tools: list_files, read_file, search_symbol, find_callers, trace_calls
- Same implementations as backend/mcp_server.py β€” GitHub API for directory
browsing/file reading, Qdrant store for symbol/caller lookups
- Agent explores until it finds 4-6 real architectural decisions, capped at
8 rounds; falls back to static _phase_map() if loop fails
- Each THINK→TOOL→RESULT round streams as a trace event so the UI shows
the ReAct loop live β€” educational demonstration of agentic AI in action

UI: TracePanel now handles "react" event type with a wrench icon, visually
distinguishing exploration steps from investigation findings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

backend/services/tour_agent.py CHANGED
@@ -532,7 +532,276 @@ class TourAgent:
532
  parts.append(f"{header}\n{text}" if text else header)
533
  return "\n\n".join(parts)
534
 
535
- # ── Phase 1: Map ──────────────────────────────────────────────────────────
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
536
 
537
  def _phase_map(self, repo: str, readme_text: str) -> dict:
538
  """
@@ -1130,19 +1399,34 @@ Rules:
1130
  "trace": {"type": "info", "text": "No README in index"},
1131
  }
1132
 
1133
- # ── Phase 1: Map ──────────────────────────────────────────────────────
 
 
1134
  yield {
1135
  "stage": "mapping", "progress": 0.15,
1136
- "message": "Mapping pipeline from README + imports…",
1137
  "trace": {"type": "thinking",
1138
- "text": "Combining stated purpose with actual call graph…"},
1139
  }
1140
 
1141
- try:
1142
- pipeline_map = self._phase_map(repo, readme_text)
1143
- except Exception as e:
 
 
 
 
 
 
 
 
 
 
 
 
 
1144
  yield {"stage": "error", "progress": 1.0,
1145
- "error": f"Pipeline mapping failed: {e}"}
1146
  return
1147
 
1148
  stages = pipeline_map.get("pipeline_stages", [])
 
532
  parts.append(f"{header}\n{text}" if text else header)
533
  return "\n\n".join(parts)
534
 
535
+ # ── Phase 1 (Agentic): ReAct exploration loop ─────────────────────────────
536
+
537
+ # WHY AGENTIC: a one-shot static snapshot gives the LLM ~14 random module
538
+ # chunks and asks it to identify design decisions. The LLM often picks code
539
+ # identifiers (method names, filenames) because those are the most prominent
540
+ # things visible in the chunks β€” not because they're design decisions.
541
+ #
542
+ # An agentic loop gives the LLM TOOLS and lets it decide what to read.
543
+ # Like a developer joining a new project, it starts at the top (README +
544
+ # directory listing), narrows in on interesting files, and stops when it
545
+ # understands the architecture β€” not when a fixed token budget runs out.
546
+ #
547
+ # The generator yields trace events so the UI can show the ReAct loop live:
548
+ # THINK β†’ TOOL β†’ RESULT β†’ THINK β†’ TOOL β†’ RESULT β†’ ... β†’ DONE
549
+ # This doubles as an educational demonstration of how agentic AI works.
550
+
551
+ # Tools exposed to the ReAct agent β€” same capabilities as our MCP server
552
+ # (backend/mcp_server.py) but called directly rather than over the wire.
553
+ # The MCP server already defines: list_files, read_file, search_symbol,
554
+ # find_callers, trace_calls, search_code. We reuse that same logic here
555
+ # so the Phase 1 agent has exactly the same power as a Claude Code session
556
+ # connected to our MCP server.
557
+
558
+ _AGENTIC_MAP_SYSTEM = (
559
+ "You are a senior engineer exploring an unfamiliar codebase to identify its key "
560
+ "ARCHITECTURAL DECISIONS β€” the non-obvious choices where a simpler alternative "
561
+ "was deliberately rejected.\n\n"
562
+ "TOOLS β€” call exactly one per turn:\n"
563
+ " list_files(path) list files/dirs at a path (\"\" = repo root)\n"
564
+ " read_file(filepath) read a source file (imports, classes, functions)\n"
565
+ " search_symbol(name) find a class or function definition by exact name\n"
566
+ " find_callers(name) find all call sites of a function/class\n"
567
+ " trace_calls(name) trace the call graph from an entry point\n\n"
568
+ "FORMAT β€” output exactly two lines per turn:\n"
569
+ " THINK: [one sentence: what you learned and what to investigate next]\n"
570
+ " TOOL: tool_name(\"argument\")\n\n"
571
+ " OR when you have identified 4-6 decisions:\n"
572
+ " THINK: [why you have enough information now]\n"
573
+ " DONE: {\"entry_file\":\"...\",\"readme_summary\":\"...\","
574
+ "\"pipeline_stages\":[{\"name\":\"...\",\"file\":\"...\",\"key_aspect\":\"...\"}]}\n\n"
575
+ "EXPLORATION STRATEGY:\n"
576
+ " 1. list_files(\"\") β€” see top-level repo structure\n"
577
+ " 2. read_file() key manifests (package.json, pyproject.toml, go.mod, Cargo.toml)\n"
578
+ " 3. read_file() the most interesting implementation files the README mentions\n"
579
+ " 4. search_symbol() / find_callers() to trace how key components connect\n"
580
+ " 5. DONE when you can name 4-6 real decisions grounded in what you read\n\n"
581
+ "STAGE NAME RULES (critical β€” every name is checked):\n"
582
+ " GOOD: names a technique, algorithm, or tradeoff (e.g. 'Lazy Evaluation Cache',\n"
583
+ " 'Hybrid Sparse-Dense Retrieval', 'Progressive Context Expansion')\n"
584
+ " BAD: any filename, class name, function name, or identifier with underscores\n"
585
+ " 4-6 stages only β€” core decisions, skip infrastructure (routing, config, health)\n"
586
+ " key_aspect: what simpler approach this replaces and the concrete cost of that\n\n"
587
+ "Return ONLY valid JSON in the DONE: line β€” no markdown fences, no explanation."
588
+ )
589
+
590
+ def _agentic_list_files(self, repo: str, path: str) -> str:
591
+ """GitHub API directory listing β€” same as mcp_server.list_files."""
592
+ import requests as _req
593
+ from backend.config import settings
594
+ path = path.strip("/").strip()
595
+ owner, name = repo.split("/", 1)
596
+ url = f"https://api.github.com/repos/{owner}/{name}/contents/{path}"
597
+ headers = {"Accept": "application/vnd.github.v3+json"}
598
+ if settings.github_token:
599
+ headers["Authorization"] = f"token {settings.github_token}"
600
+ try:
601
+ resp = _req.get(url, headers=headers, timeout=15)
602
+ if resp.status_code == 404:
603
+ return f"Path not found: '{path}' in {repo}"
604
+ resp.raise_for_status()
605
+ except Exception as e:
606
+ return f"GitHub fetch failed: {e}"
607
+ entries = resp.json()
608
+ if not isinstance(entries, list):
609
+ return f"'{path}' is a file β€” use read_file to read it."
610
+ dirs = sorted([e["name"] + "/" for e in entries if e["type"] == "dir"])
611
+ files = sorted([
612
+ f"{e['name']} ({e.get('size',0)//1024}KB)" if e.get("size",0)>=1024
613
+ else f"{e['name']} ({e.get('size',0)}B)"
614
+ for e in entries if e["type"] == "file"
615
+ ])
616
+ return f"# {repo}/{path or ''}\n" + "\n".join(dirs + files)
617
+
618
+ def _agentic_read_file(self, repo: str, filepath: str) -> str:
619
+ """GitHub API file read, truncated to ~600 tokens β€” same as mcp_server.read_file."""
620
+ import requests as _req
621
+ from backend.config import settings
622
+ filepath = filepath.strip()
623
+ owner, name = repo.split("/", 1)
624
+ url = f"https://api.github.com/repos/{owner}/{name}/contents/{filepath}"
625
+ headers = {"Accept": "application/vnd.github.v3.raw"}
626
+ if settings.github_token:
627
+ headers["Authorization"] = f"token {settings.github_token}"
628
+ try:
629
+ resp = _req.get(url, headers=headers, timeout=15)
630
+ if resp.status_code == 404:
631
+ return f"File not found: {filepath}"
632
+ resp.raise_for_status()
633
+ except Exception as e:
634
+ return f"GitHub fetch failed: {e}"
635
+ lines = resp.text.splitlines()
636
+ total = len(lines)
637
+ # Return up to 120 lines β€” enough to see imports, class defs, top-level functions.
638
+ # Cap keeps transcript size manageable across 8 rounds.
639
+ preview = "\n".join(f"{i+1}: {l}" for i, l in enumerate(lines[:120]))
640
+ suffix = f"\n… ({total - 120} more lines)" if total > 120 else ""
641
+ return f"# {repo} β€” {filepath} ({total} lines)\n\n{preview}{suffix}"
642
+
643
+ def _agentic_search_symbol(self, repo: str, symbol_name: str) -> str:
644
+ """Find a class or function definition by name β€” wraps store.find_symbol."""
645
+ matches = self._store.find_symbol(symbol_name, repo=repo)
646
+ if not matches:
647
+ return f"No definition found for '{symbol_name}'. Try search_symbol with the exact name."
648
+ parts = []
649
+ for i, c in enumerate(matches[:4], 1):
650
+ loc = f"{c.get('filepath','?')} L{c.get('start_line','?')}–{c.get('end_line','?')}"
651
+ parts.append(f"[{i}] {loc}\n{c.get('text','')[:400]}")
652
+ return f"Definitions of '{symbol_name}':\n\n" + "\n\n".join(parts)
653
+
654
+ def _agentic_find_callers(self, repo: str, function_name: str) -> str:
655
+ """Find all call sites β€” wraps store.find_callers."""
656
+ callers = self._store.find_callers(function_name, repo=repo)
657
+ if not callers:
658
+ return f"No call sites found for '{function_name}'."
659
+ parts = []
660
+ for i, c in enumerate(callers[:6], 1):
661
+ loc = f"{c.get('filepath','?')} β€” {c.get('name','?')} L{c.get('start_line','?')}"
662
+ parts.append(f"[{i}] {loc}\n{c.get('text','')[:300]}")
663
+ return f"Call sites of '{function_name}' ({len(callers)} found):\n\n" + "\n\n".join(parts)
664
+
665
+ def _agentic_trace_calls(self, repo: str, symbol_name: str) -> str:
666
+ """Trace call graph from an entry point β€” same logic as mcp_server.trace_calls."""
667
+ visited: set[str] = set()
668
+ lines: list[str] = [f"# Call trace from `{symbol_name}`\n"]
669
+
670
+ def _walk(name: str, depth: int, prefix: str) -> None:
671
+ if depth > 3 or name in visited:
672
+ return
673
+ visited.add(name)
674
+ chunks = self._store.find_symbol(name, repo=repo)
675
+ if not chunks:
676
+ return
677
+ c = chunks[0]
678
+ loc = f"{c.get('filepath','?')} L{c.get('start_line','?')}"
679
+ lines.append(f"{prefix}β†’ {name}() `{loc}`")
680
+ for callee in (c.get("calls") or [])[:5]:
681
+ _walk(callee, depth + 1, prefix + " ")
682
+
683
+ _walk(symbol_name, 0, "")
684
+ return "\n".join(lines) if len(lines) > 1 else f"Symbol '{symbol_name}' not found in index."
685
+
686
+ def _phase_map_agentic(self, repo: str, readme_text: str):
687
+ """Generator: ReAct exploration loop for Phase 1.
688
+
689
+ Yields dict trace events as it runs (forwarded to the UI live-log panel),
690
+ then yields a final {"type": "result", "data": pipeline_map_dict} when done.
691
+
692
+ Falls back to static _phase_map() on parse failure or exhausted rounds.
693
+ """
694
+ manifest_chunks = self._manifest_chunks(repo)
695
+ manifest_text = _token_budget(
696
+ "\n\n".join(
697
+ f"── {c['filepath']}\n{c['text'].strip()[:500]}"
698
+ for c in manifest_chunks
699
+ ),
700
+ max_tokens=500,
701
+ )
702
+
703
+ # Seed the transcript with the two highest-signal sources: README + manifests.
704
+ # The agent decides where to go from here.
705
+ transcript = f"Repository: {repo}\n\n"
706
+ if readme_text:
707
+ transcript += f"README:\n{readme_text}\n\n"
708
+ if manifest_text.strip():
709
+ transcript += f"Manifest files (dependencies / entry points):\n{manifest_text}\n\n"
710
+ transcript += "Begin exploration. Start with list_files(\"\") to see the top-level repo structure.\n"
711
+
712
+ max_rounds = 8
713
+ for round_n in range(max_rounds):
714
+ raw = self._gen.generate(
715
+ self._AGENTIC_MAP_SYSTEM, transcript,
716
+ temperature=0.0, max_tokens=700,
717
+ )
718
+
719
+ # Parse THINK + TOOL or DONE from the LLM's response
720
+ think_m = _re.search(r'THINK:\s*(.+?)(?:\n|$)', raw, _re.IGNORECASE | _re.DOTALL)
721
+ tool_m = _re.search(r'TOOL:\s*(\w+)\(\s*"?([^")\n]*)"?\s*\)', raw, _re.IGNORECASE)
722
+ done_m = _re.search(r'DONE:\s*(\{.+)', raw, _re.DOTALL)
723
+
724
+ think_text = (think_m.group(1).strip() if think_m else raw[:120].strip())
725
+
726
+ # ── DONE ──────────────────────────────────────────────────────────
727
+ if done_m:
728
+ try:
729
+ result = _parse_json(done_m.group(1))
730
+ if result.get("pipeline_stages"):
731
+ yield {"type": "thinking",
732
+ "text": f"βœ“ Done in {round_n + 1} round(s): {think_text}"}
733
+ yield {"type": "result", "data": result}
734
+ return
735
+ except Exception:
736
+ pass # malformed JSON β€” keep going
737
+
738
+ # ── TOOL CALL ─────────────────────────────────────────────────────
739
+ if tool_m:
740
+ tool_name = tool_m.group(1).lower().replace("-", "_")
741
+ tool_arg = tool_m.group(2).strip().strip('"').strip("'")
742
+
743
+ if tool_name == "list_files":
744
+ tool_result = self._agentic_list_files(repo, tool_arg)
745
+ display = f"list_files(\"{tool_arg}\")"
746
+ elif tool_name == "read_file":
747
+ tool_result = self._agentic_read_file(repo, tool_arg)
748
+ display = f"read_file(\"{tool_arg}\")"
749
+ elif tool_name == "search_symbol":
750
+ tool_result = self._agentic_search_symbol(repo, tool_arg)
751
+ display = f"search_symbol(\"{tool_arg}\")"
752
+ elif tool_name == "find_callers":
753
+ tool_result = self._agentic_find_callers(repo, tool_arg)
754
+ display = f"find_callers(\"{tool_arg}\")"
755
+ elif tool_name == "trace_calls":
756
+ tool_result = self._agentic_trace_calls(repo, tool_arg)
757
+ display = f"trace_calls(\"{tool_arg}\")"
758
+ else:
759
+ tool_result = f"(unknown tool '{tool_name}' β€” use: list_files, read_file, search_symbol, find_callers, trace_calls)"
760
+ display = tool_name
761
+
762
+ # Emit a trace event for the UI live-log panel
763
+ yield {"type": "react",
764
+ "think": think_text,
765
+ "tool": display,
766
+ "text": f"THINK: {think_text} β†’ {display}"}
767
+
768
+ # Truncate tool output so the transcript doesn't balloon
769
+ tool_result = _token_budget(tool_result, max_tokens=400)
770
+ transcript += (
771
+ f"\nTHINK: {think_text}\n"
772
+ f"TOOL: {display}\n"
773
+ f"RESULT:\n{tool_result}\n"
774
+ )
775
+ else:
776
+ # LLM output couldn't be parsed β€” nudge it
777
+ transcript += f"\n[No valid action found in round {round_n + 1}. Output TOOL: or DONE:]\n"
778
+ yield {"type": "thinking", "text": f"Round {round_n + 1}: retrying parse…"}
779
+
780
+ # ── Exhausted rounds β€” force final output ──────────────────────────────
781
+ yield {"type": "thinking", "text": "Reached round limit β€” requesting final output…"}
782
+ transcript += "\nROUND LIMIT REACHED. Output DONE: now with what you have found.\n"
783
+ raw = self._gen.generate(
784
+ self._AGENTIC_MAP_SYSTEM, transcript,
785
+ temperature=0.0, max_tokens=1200,
786
+ )
787
+ done_m = _re.search(r'DONE:\s*(\{.+)', raw, _re.DOTALL)
788
+ try:
789
+ result = _parse_json(done_m.group(1) if done_m else raw)
790
+ if result.get("pipeline_stages"):
791
+ yield {"type": "result", "data": result}
792
+ return
793
+ except Exception:
794
+ pass
795
+
796
+ # Complete fallback: static snapshot Phase 1
797
+ yield {"type": "thinking", "text": "Agentic loop failed β€” falling back to static Phase 1"}
798
+ try:
799
+ result = self._phase_map(repo, readme_text)
800
+ yield {"type": "result", "data": result}
801
+ except Exception as e:
802
+ yield {"type": "result", "data": {}}
803
+
804
+ # ── Phase 1: Map (static fallback) ────────────────────────────────────────
805
 
806
  def _phase_map(self, repo: str, readme_text: str) -> dict:
807
  """
 
1399
  "trace": {"type": "info", "text": "No README in index"},
1400
  }
1401
 
1402
+ # ── Phase 1: Agentic Map (ReAct loop) ────────────────────────────────
1403
+ # The agent explores with list_directory + read_file tools, emitting
1404
+ # trace events for the UI live-log panel as it reasons about the code.
1405
  yield {
1406
  "stage": "mapping", "progress": 0.15,
1407
+ "message": "Exploring codebase with ReAct agent…",
1408
  "trace": {"type": "thinking",
1409
+ "text": "Starting agentic exploration: README β†’ directories β†’ key files…"},
1410
  }
1411
 
1412
+ pipeline_map: dict = {}
1413
+ react_prog = 0.15
1414
+ for event in self._phase_map_agentic(repo, readme_text):
1415
+ if event.get("type") == "result":
1416
+ pipeline_map = event.get("data", {})
1417
+ break
1418
+ # Forward trace events to the UI; advance progress slightly each round
1419
+ react_prog = min(react_prog + 0.015, 0.24)
1420
+ yield {
1421
+ "stage": "mapping",
1422
+ "progress": react_prog,
1423
+ "message": event.get("text", ""),
1424
+ "trace": event,
1425
+ }
1426
+
1427
+ if not pipeline_map.get("pipeline_stages"):
1428
  yield {"stage": "error", "progress": 1.0,
1429
+ "error": "Pipeline mapping failed β€” no stages found"}
1430
  return
1431
 
1432
  stages = pipeline_map.get("pipeline_stages", [])
ui/src/components/ExploreView.jsx CHANGED
@@ -290,7 +290,12 @@ function ConceptCard({ concept, visualNum, isEntry, isSelected, isHovered, isDim
290
 
291
  // ── TracePanel β€” live log of agent investigation steps ─────────────────────────
292
  // Each entry in `log` is the "trace" payload from a TourAgent SSE event:
293
- // { type: "info"|"thinking"|"found"|"file"|"finding", text, name?, stages? }
 
 
 
 
 
294
  //
295
  // WHY SHOW THIS: transparency builds trust. When users see "Investigating:
296
  // retrieval/hybrid_search.py" they understand WHY that concept appears in
@@ -306,6 +311,12 @@ function TracePanel({ log, open, onToggle }) {
306
  }, [log, open]);
307
 
308
  const ICONS = {
 
 
 
 
 
 
309
  thinking: (
310
  <svg viewBox="0 0 16 16" fill="currentColor" width="12" height="12">
311
  <path d="M8 0a8 8 0 1 1 0 16A8 8 0 0 1 8 0zm.93 6.588-2.29.287-.082.38.45.083c.294.07.352.176.288.469l-.738 3.468c-.194.897.105 1.319.808 1.319.545 0 1.178-.252 1.465-.598l.088-.416c-.2.176-.492.246-.686.246-.275 0-.375-.193-.304-.533zM8 5.5a1 1 0 1 0 0-2 1 1 0 0 0 0 2z"/>
 
290
 
291
  // ── TracePanel β€” live log of agent investigation steps ─────────────────────────
292
  // Each entry in `log` is the "trace" payload from a TourAgent SSE event:
293
+ // { type: "info"|"thinking"|"found"|"file"|"finding"|"react", text, name?, stages? }
294
+ //
295
+ // "react" entries come from the agentic Phase 1 ReAct loop β€” they show the
296
+ // THINK β†’ TOOL β†’ RESULT cycle that the agent uses to explore the codebase.
297
+ // Showing this live demonstrates how agentic AI works: the model reasons about
298
+ // what to read next, calls a tool, reads the result, and decides where to go.
299
  //
300
  // WHY SHOW THIS: transparency builds trust. When users see "Investigating:
301
  // retrieval/hybrid_search.py" they understand WHY that concept appears in
 
311
  }, [log, open]);
312
 
313
  const ICONS = {
314
+ // ReAct loop step β€” tool icon (wrench) to distinguish from investigation steps
315
+ react: (
316
+ <svg viewBox="0 0 16 16" fill="currentColor" width="12" height="12">
317
+ <path d="M13.371 2.629a3.5 3.5 0 0 0-4.849 4.274L2.78 12.745a1.5 1.5 0 1 0 2.121 2.121l5.842-5.742a3.5 3.5 0 0 0 2.628-6.495zm-1.414 3.536a1.5 1.5 0 1 1-2.121-2.122 1.5 1.5 0 0 1 2.121 2.122z"/>
318
+ </svg>
319
+ ),
320
  thinking: (
321
  <svg viewBox="0 0 16 16" fill="currentColor" width="12" height="12">
322
  <path d="M8 0a8 8 0 1 1 0 16A8 8 0 0 1 8 0zm.93 6.588-2.29.287-.082.38.45.083c.294.07.352.176.288.469l-.738 3.468c-.194.897.105 1.319.808 1.319.545 0 1.178-.252 1.465-.598l.088-.416c-.2.176-.492.246-.686.246-.275 0-.375-.193-.304-.533zM8 5.5a1 1 0 1 0 0-2 1 1 0 0 0 0 2z"/>