title: CivicSetu
emoji: ποΈ
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: default
app_file: app.py
pinned: false
CivicSetu
Live: https://civicsetu-two.vercel.app
Open-source RAG system for querying Indian civic and legal documents β with accurate citations, cross-reference traversal, and conflict detection between laws.
Current status: Phase 9 complete β 5-jurisdiction RERA coverage, RAGAS evaluation pipeline (0.90 faithfulness), hybrid RRF retrieval, and mobile-responsive Next.js frontend live on Vercel.
What it does
Ask a plain-English question about RERA. Get a cited, structured answer with section references, confidence score, and a legal disclaimer β grounded in real legal text.
Query: "Which state rules implement section 9 of RERA on agent registration?"
Answer: "Section 9 of the RERA Act 2016 governs agent registration at the central level.
Rule 11 of Maharashtra Rules 2017 and Rule 8 of Karnataka RERA Rules derive
from Section 9, specifying application procedures and timelines..."
Citations: [Section 9, RERA Act 2016], [Rule 11, Maharashtra Rules 2017],
[Rule 8, Karnataka RERA Rules]
Confidence: 0.96 (high)
Architecture
FastAPI β LangGraph Agent β pgvector + Neo4j + PostgreSQL
β
Ingestion Pipeline (PDF β chunks β embeddings β graph)
Three stores per query:
- pgvector β semantic similarity (fact lookups)
- Neo4j β section graph traversal (cross-references, DERIVED_FROM edges)
- PostgreSQL β full chunk text + metadata
Full design: HLD.md | LLD.md | RAG.md
Quickstart
Prerequisites
- Docker + Docker Compose
uvpackage manager- One of: Gemini API key (free tier) or Groq API key (free tier)
No Ollama required. Embeddings run locally via
sentence-transformers. First run downloadsnomic-embed-text-v1.5(~550MB) from HuggingFace and caches it.
Setup
# 1. Clone and install
git clone https://github.com/adeshboudh/civicsetu.git && cd civicsetu
make install
# 2. Configure secrets
cp .env.example .env
# Set GEMINI_API_KEY and/or GROQ_API_KEY β everything else has working defaults
# 3. Start infrastructure
make docker-up
# 4. Ingest all 5 jurisdictions
make ingest
# 5. Start the API
make serve
Production
- Frontend: Vercel β Next.js 15 App Router (Mobile Responsive)
- API: Hugging Face Spaces β FastAPI + Docker + 550MB model baked in
- PostgreSQL + pgvector: Neon β 1203 chunks
- Neo4j: AuraDB Free β 2090 sections, 2321 edges
- LLM: LiteLLM (Gemini β Groq β OpenRouter)
6. Query
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{"query": "What are the penalties for a promoter who delays possession?"}'
First request will be slow (~30β45s) while the embedding model loads into memory. Subsequent requests run at 5β15s.
Other useful commands
make e2e # Run 12-case E2E benchmark across all 5 jurisdictions
make test # Run unit tests
make lint # Ruff linter
make typecheck # mypy
# RAGAS evaluation
make eval-smoke-p1 # Phase 1: invoke graph for 5-row smoke dataset
make eval-smoke-p2 # Phase 2: score cached results with RAGAS
make eval-p1 # Phase 1: full 31-row golden dataset
make eval-p2 # Phase 2: score all 31 rows
make eval-reset # Clear eval caches (re-runs everything)
make ingest --jurisdiction MAHARASHTRA # Re-ingest a single jurisdiction
make docker-down # Tear down containers
Documents ingested
| Document | Jurisdiction | Sections |
|---|---|---|
| RERA Act 2016 | Central | 224 |
| Maharashtra Real Estate Rules 2017 | Maharashtra | 214 |
| UP RERA Rules 2016 | Uttar Pradesh | 170 |
| UP RERA General Regulations 2019 | Uttar Pradesh | 85 |
| Karnataka RERA Rules 2017 | Karnataka | 235 |
| Tamil Nadu RERA Rules 2017 | Tamil Nadu | 157 |
Total chunks: 1203. Graph: 2090 Section nodes, 1297 HAS_SECTION edges, 933 REFERENCES edges, 91 DERIVED_FROM edges.
Tech stack
| Layer | Technology |
|---|---|
| API | FastAPI + Uvicorn |
| Orchestration | LangGraph StateGraph |
| LLM routing | LiteLLM (Gemini β Groq β OpenRouter) |
| Embeddings | nomic-embed-text-v1.5 via sentence-transformers (local, no Ollama required) |
| Vector DB | pgvector + HNSW index |
| Graph DB | Neo4j Community |
| Relational | PostgreSQL + SQLAlchemy |
| Retrieval | Hybrid RRF: pgvector cosine + PostgreSQL FTS (websearch_to_tsquery OR-mode) |
| Reranker | FlashRank (rank-T5-flan) + score gap filter |
| Evaluation | RAGAS (faithfulness, answer relevancy, context precision) |
| PDF parsing | PyMuPDF |
Phase roadmap
| Phase | Scope | Status |
|---|---|---|
| 0 | RERA Act 2016, vector RAG, FastAPI | β Complete |
| 1 | Neo4j graph, cross-reference queries | β Complete |
| 2 | MahaRERA Rules 2017, multi-jurisdiction | β Complete |
| 3 | DERIVED_FROM edges, cross-jurisdiction graph | β Complete |
| 4 | Multi-state expansion (UP, TN, Karnataka) | β Complete |
| 5 | Agent pipeline hardening, E2E test suite | β Complete |
| 6 | Next.js frontend, Vercel deployment, public URL | β Complete |
| 7 | Graph explorer, section content drawer, D3 visualization | β Complete |
| 8 | RAGAS eval pipeline, hybrid RRF retrieval, retrieval quality fixes | β Complete |
| 9 | Mobile responsiveness, frontend polish, dual-pane layout, interaction animations | β Complete |
ADRs
- RAG Techniques Reference β hybrid retrieval, RRF, reranking, RAGAS eval, known failure modes
- ADR 001 β three store architecture
- ADR 002 β section boundary chunking
- ADR 003 β LangGraph over LangChain chains
- ADR 004 β Multi-format chunker
- ADR 005 β Document registry
Disclaimer
CivicSetu provides AI-generated legal information, not legal advice. Always verify with a qualified lawyer or the official gazette.