civicsetu / README.md
adeshboudh16
updated docs
8de7198
---
title: CivicSetu
emoji: πŸ›οΈ
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: "default"
app_file: app.py
pinned: false
---
# CivicSetu
**Live:** [https://civicsetu-two.vercel.app](https://civicsetu-two.vercel.app)
Open-source RAG system for querying Indian civic and legal documents β€” with accurate
citations, cross-reference traversal, and conflict detection between laws.
**Current status:** Phase 9 complete β€” 5-jurisdiction RERA coverage, RAGAS evaluation pipeline (0.90 faithfulness), hybrid RRF retrieval, and mobile-responsive Next.js frontend live on Vercel.
---
## What it does
Ask a plain-English question about RERA. Get a cited, structured answer with section
references, confidence score, and a legal disclaimer β€” grounded in real legal text.
```
Query: "Which state rules implement section 9 of RERA on agent registration?"
Answer: "Section 9 of the RERA Act 2016 governs agent registration at the central level.
Rule 11 of Maharashtra Rules 2017 and Rule 8 of Karnataka RERA Rules derive
from Section 9, specifying application procedures and timelines..."
Citations: [Section 9, RERA Act 2016], [Rule 11, Maharashtra Rules 2017],
[Rule 8, Karnataka RERA Rules]
Confidence: 0.96 (high)
```
---
## Architecture
```
FastAPI β†’ LangGraph Agent β†’ pgvector + Neo4j + PostgreSQL
↑
Ingestion Pipeline (PDF β†’ chunks β†’ embeddings β†’ graph)
```
Three stores per query:
- **pgvector** β€” semantic similarity (fact lookups)
- **Neo4j** β€” section graph traversal (cross-references, DERIVED_FROM edges)
- **PostgreSQL** β€” full chunk text + metadata
Full design: [HLD.md](docs/HLD.md) | [LLD.md](docs/LLD.md) | [RAG.md](docs/RAG.md)
---
## Quickstart
### Prerequisites
- Docker + Docker Compose
- `uv` package manager
- One of: Gemini API key (free tier) or Groq API key (free tier)
> **No Ollama required.** Embeddings run locally via `sentence-transformers`.
> First run downloads `nomic-embed-text-v1.5` (~550MB) from HuggingFace and caches it.
### Setup
```bash
# 1. Clone and install
git clone https://github.com/adeshboudh/civicsetu.git && cd civicsetu
make install
# 2. Configure secrets
cp .env.example .env
# Set GEMINI_API_KEY and/or GROQ_API_KEY β€” everything else has working defaults
# 3. Start infrastructure
make docker-up
# 4. Ingest all 5 jurisdictions
make ingest
# 5. Start the API
make serve
```
**Full docs:** [HLD](docs/HLD.md) | [LLD](docs/LLD.md)
## Production
- **Frontend:** [Vercel](https://civicsetu-two.vercel.app) β€” Next.js 15 App Router (Mobile Responsive)
- **API:** [Hugging Face Spaces](https://huggingface.co/spaces/adesh01/civicsetu) β€” FastAPI + Docker + 550MB model baked in
- **PostgreSQL + pgvector:** [Neon](https://neon.tech) β€” 1203 chunks
- **Neo4j:** [AuraDB Free](https://neo4j.com/cloud/aura) β€” 2090 sections, 2321 edges
- **LLM:** LiteLLM (Gemini β†’ Groq β†’ OpenRouter)
### 6. Query
```bash
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{"query": "What are the penalties for a promoter who delays possession?"}'
```
> First request will be slow (~30–45s) while the embedding model loads into memory.
> Subsequent requests run at 5–15s.
### Other useful commands
```bash
make e2e # Run 12-case E2E benchmark across all 5 jurisdictions
make test # Run unit tests
make lint # Ruff linter
make typecheck # mypy
# RAGAS evaluation
make eval-smoke-p1 # Phase 1: invoke graph for 5-row smoke dataset
make eval-smoke-p2 # Phase 2: score cached results with RAGAS
make eval-p1 # Phase 1: full 31-row golden dataset
make eval-p2 # Phase 2: score all 31 rows
make eval-reset # Clear eval caches (re-runs everything)
make ingest --jurisdiction MAHARASHTRA # Re-ingest a single jurisdiction
make docker-down # Tear down containers
```
---
## Documents ingested
| Document | Jurisdiction | Sections |
| ---------------------------------- | ------------- | -------- |
| RERA Act 2016 | Central | 224 |
| Maharashtra Real Estate Rules 2017 | Maharashtra | 214 |
| UP RERA Rules 2016 | Uttar Pradesh | 170 |
| UP RERA General Regulations 2019 | Uttar Pradesh | 85 |
| Karnataka RERA Rules 2017 | Karnataka | 235 |
| Tamil Nadu RERA Rules 2017 | Tamil Nadu | 157 |
Total chunks: 1203.
Graph: 2090 Section nodes, 1297 HAS_SECTION edges, 933 REFERENCES edges, 91 DERIVED_FROM edges.
---
## Tech stack
| Layer | Technology |
| :-- | :-- |
| API | FastAPI + Uvicorn |
| Orchestration | LangGraph StateGraph |
| LLM routing | LiteLLM (Gemini β†’ Groq β†’ OpenRouter) |
| Embeddings | nomic-embed-text-v1.5 via sentence-transformers (local, no Ollama required) |
| Vector DB | pgvector + HNSW index |
| Graph DB | Neo4j Community |
| Relational | PostgreSQL + SQLAlchemy |
| Retrieval | Hybrid RRF: pgvector cosine + PostgreSQL FTS (websearch_to_tsquery OR-mode) |
| Reranker | FlashRank (rank-T5-flan) + score gap filter |
| Evaluation | RAGAS (faithfulness, answer relevancy, context precision) |
| PDF parsing | PyMuPDF |
---
## Phase roadmap
| Phase | Scope | Status |
| :-- | :-- | :-- |
| 0 | RERA Act 2016, vector RAG, FastAPI | βœ… Complete |
| 1 | Neo4j graph, cross-reference queries | βœ… Complete |
| 2 | MahaRERA Rules 2017, multi-jurisdiction | βœ… Complete |
| 3 | DERIVED_FROM edges, cross-jurisdiction graph | βœ… Complete |
| 4 | Multi-state expansion (UP, TN, Karnataka) | βœ… Complete |
| 5 | Agent pipeline hardening, E2E test suite | βœ… Complete |
| 6 | Next.js frontend, Vercel deployment, public URL | βœ… Complete |
| 7 | Graph explorer, section content drawer, D3 visualization | βœ… Complete |
| 8 | RAGAS eval pipeline, hybrid RRF retrieval, retrieval quality fixes | βœ… Complete |
| 9 | Mobile responsiveness, frontend polish, dual-pane layout, interaction animations | βœ… Complete |
---
## ADRs
- [RAG Techniques Reference](docs/RAG.md) β€” hybrid retrieval, RRF, reranking, RAGAS eval, known failure modes
- [ADR 001 β€” three store architecture](docs/adr/001-three-store-architecture.md)
- [ADR 002 β€” section boundary chunking](docs/adr/002-section-boundary-chunking.md)
- [ADR 003 β€” LangGraph over LangChain chains](docs/adr/003-langgraph-over-langchain.md)
- [ADR 004 β€” Multi-format chunker](docs/adr/004-multi-format-chunker.md)
- [ADR 005 β€” Document registry](docs/adr/005-document-registry.md)
## Disclaimer
CivicSetu provides AI-generated legal information, not legal advice.
Always verify with a qualified lawyer or the official gazette.