cartographer / PLAN.md
umanggarg's picture
rebrand: update all display name references to Cartographer
410d1c8

Cartographer β€” Build Plan

A RAG system that indexes GitHub repositories and answers natural language questions about their code, architecture, and documentation.


Learning Objectives

By the end of this project you will understand:

  • How RAG works on source code (not just documents)
  • AST-based code chunking vs. fixed character windows
  • Code-aware embeddings vs. general text embeddings
  • Metadata-rich retrieval (file, function, class, language, line numbers)
  • Hosted vector databases (Qdrant Cloud) and why they enable free deployment
  • Live deployment: frontend on Vercel, backend on Render, vectors on Qdrant Cloud
  • Claude Code features: CLAUDE.md, hooks, slash commands, subagents

Architecture Overview

GitHub URL
    β”‚
    β–Ό
[Ingestion Pipeline]
    β”œβ”€β”€ Fetch repo via GitHub API (no clone needed for public repos)
    β”œβ”€β”€ Filter files by language β€” skip binaries, lock files, node_modules
    β”œβ”€β”€ Chunk by AST boundaries (functions, classes)
    β”‚       └── fallback: character windows for markdown, config, plain text
    β”œβ”€β”€ Embed with nomic-embed-code (code-optimised model)
    └── Store in Qdrant Cloud
            └── metadata: repo, filepath, language,
                         function_name, class_name, start_line, end_line

    β”‚
    β–Ό
[Query Pipeline]
    β”œβ”€β”€ Embed query with same model
    β”œβ”€β”€ Hybrid search (dense vector + sparse BM25, native in Qdrant)
    β”œβ”€β”€ Relevance threshold (reject out-of-domain queries)
    └── LLM generation (Groq / Claude)
            └── citations: filepath + line range

Phases

Phase 1 β€” Core Ingestion

  • ingestion/repo_fetcher.py β€” fetch file tree + content via GitHub API
  • ingestion/file_filter.py β€” include/exclude rules per language
  • ingestion/code_chunker.py β€” AST-based chunking for Python; character-window fallback for other file types
  • ingestion/embedder.py β€” embed chunks with nomic-ai/nomic-embed-code
  • ingestion/qdrant_store.py β€” upsert chunks into Qdrant Cloud collection

Phase 2 β€” Retrieval & Generation

  • retrieval/retrieval.py β€” hybrid search using Qdrant's native dense + sparse
  • backend/services/generation.py β€” LLM answer generation with code-aware system prompt
  • backend/services/ingestion_service.py β€” orchestrate full ingestion pipeline
  • FastAPI backend with /ingest, /query, /search endpoints

Phase 3 β€” UI

  • React + Vite frontend
  • Repo URL input instead of file upload
  • Citations show filepath + line numbers
  • Syntax-highlighted code chunks in source passages
  • Multi-repo selector in sidebar

Phase 4 β€” Live Deployment

  • Frontend β†’ Vercel (free, static hosting)
  • Backend β†’ Render (free tier β€” lightweight since no local ML model)
  • Vector DB β†’ Qdrant Cloud (permanent free tier, 1GB)
  • Embeddings β†’ Qdrant's built-in vectoriser or Voyage AI API (removes model from backend, keeps Render on free tier)
  • Environment variable setup, CORS configuration
  • GitHub Actions CI: lint + deploy on push to main

Phase 5 β€” Claude Code Features (Throughout)

  • CLAUDE.md β€” project briefing for Claude Code sessions
  • Hooks β€” auto-lint on file edit, reminder to update notes after commit
  • Slash commands β€” /ingest-repo, /search-code, /add-to-notes
  • Subagent patterns β€” parallel ingestion, expert review before PRs

Tech Stack

Layer Choice Why
Repo fetch GitHub REST API No local clone needed; works without git installed
Code parsing ast (Python), tree-sitter (multi-lang) Split at function/class boundaries
Embeddings nomic-ai/nomic-embed-code Fine-tuned on code, free, runs locally
Vector DB Qdrant Cloud (free tier) Permanent free 1GB, native hybrid search, enables deployment
LLM Groq Llama 3.3 70B / Claude Haiku Fast, cheap/free
Backend FastAPI + Uvicorn Lightweight, async, auto-docs
Frontend React + Vite Fast dev server, small production bundle
Frontend hosting Vercel Free, zero-config for Vite apps
Backend hosting Render Free tier works once model is removed from server
CI/CD GitHub Actions Lint and deploy on push

Deployment Architecture

User browser
    β”‚
    β”œβ”€β”€ Static files ──→ Vercel (free)
    β”‚                        React UI
    β”‚
    └── API calls ──────→ Render (free)
                              FastAPI backend
                                  β”‚
                                  β”œβ”€β”€β†’ Qdrant Cloud (free)
                                  β”‚        Vector storage + hybrid search
                                  β”‚
                                  └──→ Groq API (free)
                                           LLM generation

The key insight: by using Qdrant Cloud for vector storage and a remote embedding API (instead of running the model on the server), the backend becomes a lightweight HTTP service with minimal RAM usage β€” fitting within Render's free tier (512MB RAM).


Notes Directory

notes/ is updated after every PR:

  • What was built
  • Key decisions made
  • Concepts learned
  • What's next

See notes/000-project-setup.md for the first entry.