# Cartographer — Build Plan A RAG system that indexes GitHub repositories and answers natural language questions about their code, architecture, and documentation. --- ## Learning Objectives By the end of this project you will understand: - How RAG works on source code (not just documents) - AST-based code chunking vs. fixed character windows - Code-aware embeddings vs. general text embeddings - Metadata-rich retrieval (file, function, class, language, line numbers) - Hosted vector databases (Qdrant Cloud) and why they enable free deployment - Live deployment: frontend on Vercel, backend on Render, vectors on Qdrant Cloud - Claude Code features: CLAUDE.md, hooks, slash commands, subagents --- ## Architecture Overview ``` GitHub URL │ ▼ [Ingestion Pipeline] ├── Fetch repo via GitHub API (no clone needed for public repos) ├── Filter files by language — skip binaries, lock files, node_modules ├── Chunk by AST boundaries (functions, classes) │ └── fallback: character windows for markdown, config, plain text ├── Embed with nomic-embed-code (code-optimised model) └── Store in Qdrant Cloud └── metadata: repo, filepath, language, function_name, class_name, start_line, end_line │ ▼ [Query Pipeline] ├── Embed query with same model ├── Hybrid search (dense vector + sparse BM25, native in Qdrant) ├── Relevance threshold (reject out-of-domain queries) └── LLM generation (Groq / Claude) └── citations: filepath + line range ``` --- ## Phases ### Phase 1 — Core Ingestion - [ ] `ingestion/repo_fetcher.py` — fetch file tree + content via GitHub API - [ ] `ingestion/file_filter.py` — include/exclude rules per language - [ ] `ingestion/code_chunker.py` — AST-based chunking for Python; character-window fallback for other file types - [ ] `ingestion/embedder.py` — embed chunks with `nomic-ai/nomic-embed-code` - [ ] `ingestion/qdrant_store.py` — upsert chunks into Qdrant Cloud collection ### Phase 2 — Retrieval & Generation - [ ] `retrieval/retrieval.py` — hybrid search using Qdrant's native dense + sparse - [ ] `backend/services/generation.py` — LLM answer generation with code-aware system prompt - [ ] `backend/services/ingestion_service.py` — orchestrate full ingestion pipeline - [ ] FastAPI backend with `/ingest`, `/query`, `/search` endpoints ### Phase 3 — UI - [ ] React + Vite frontend - [ ] Repo URL input instead of file upload - [ ] Citations show filepath + line numbers - [ ] Syntax-highlighted code chunks in source passages - [ ] Multi-repo selector in sidebar ### Phase 4 — Live Deployment - [ ] **Frontend → Vercel** (free, static hosting) - [ ] **Backend → Render** (free tier — lightweight since no local ML model) - [ ] **Vector DB → Qdrant Cloud** (permanent free tier, 1GB) - [ ] **Embeddings → Qdrant's built-in vectoriser** or Voyage AI API (removes model from backend, keeps Render on free tier) - [ ] Environment variable setup, CORS configuration - [ ] GitHub Actions CI: lint + deploy on push to main ### Phase 5 — Claude Code Features (Throughout) - [ ] `CLAUDE.md` — project briefing for Claude Code sessions - [ ] Hooks — auto-lint on file edit, reminder to update notes after commit - [ ] Slash commands — `/ingest-repo`, `/search-code`, `/add-to-notes` - [ ] Subagent patterns — parallel ingestion, expert review before PRs --- ## Tech Stack | Layer | Choice | Why | |---|---|---| | Repo fetch | GitHub REST API | No local clone needed; works without git installed | | Code parsing | `ast` (Python), `tree-sitter` (multi-lang) | Split at function/class boundaries | | Embeddings | `nomic-ai/nomic-embed-code` | Fine-tuned on code, free, runs locally | | Vector DB | Qdrant Cloud (free tier) | Permanent free 1GB, native hybrid search, enables deployment | | LLM | Groq Llama 3.3 70B / Claude Haiku | Fast, cheap/free | | Backend | FastAPI + Uvicorn | Lightweight, async, auto-docs | | Frontend | React + Vite | Fast dev server, small production bundle | | Frontend hosting | Vercel | Free, zero-config for Vite apps | | Backend hosting | Render | Free tier works once model is removed from server | | CI/CD | GitHub Actions | Lint and deploy on push | --- ## Deployment Architecture ``` User browser │ ├── Static files ──→ Vercel (free) │ React UI │ └── API calls ──────→ Render (free) FastAPI backend │ ├──→ Qdrant Cloud (free) │ Vector storage + hybrid search │ └──→ Groq API (free) LLM generation ``` The key insight: by using **Qdrant Cloud** for vector storage and a **remote embedding API** (instead of running the model on the server), the backend becomes a lightweight HTTP service with minimal RAM usage — fitting within Render's free tier (512MB RAM). --- ## Notes Directory `notes/` is updated after every PR: - What was built - Key decisions made - Concepts learned - What's next See `notes/000-project-setup.md` for the first entry.