| --- |
| title: CivicSetu |
| emoji: ποΈ |
| colorFrom: blue |
| colorTo: purple |
| sdk: docker |
| sdk_version: "default" |
| app_file: app.py |
| pinned: false |
| --- |
| |
| # CivicSetu |
|
|
| **Live:** [https://civicsetu-two.vercel.app](https://civicsetu-two.vercel.app) |
|
|
| Open-source RAG system for querying Indian civic and legal documents β with accurate |
| citations, cross-reference traversal, and conflict detection between laws. |
|
|
| **Current status:** Phase 9 complete β 5-jurisdiction RERA coverage, RAGAS evaluation pipeline (0.90 faithfulness), hybrid RRF retrieval, and mobile-responsive Next.js frontend live on Vercel. |
|
|
| --- |
|
|
| ## What it does |
|
|
| Ask a plain-English question about RERA. Get a cited, structured answer with section |
| references, confidence score, and a legal disclaimer β grounded in real legal text. |
|
|
| ``` |
| |
| Query: "Which state rules implement section 9 of RERA on agent registration?" |
| |
| Answer: "Section 9 of the RERA Act 2016 governs agent registration at the central level. |
| Rule 11 of Maharashtra Rules 2017 and Rule 8 of Karnataka RERA Rules derive |
| from Section 9, specifying application procedures and timelines..." |
| |
| Citations: [Section 9, RERA Act 2016], [Rule 11, Maharashtra Rules 2017], |
| [Rule 8, Karnataka RERA Rules] |
| Confidence: 0.96 (high) |
| |
| ``` |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ``` |
| |
| FastAPI β LangGraph Agent β pgvector + Neo4j + PostgreSQL |
| β |
| Ingestion Pipeline (PDF β chunks β embeddings β graph) |
| |
| ``` |
|
|
| Three stores per query: |
| - **pgvector** β semantic similarity (fact lookups) |
| - **Neo4j** β section graph traversal (cross-references, DERIVED_FROM edges) |
| - **PostgreSQL** β full chunk text + metadata |
| |
| Full design: [HLD.md](docs/HLD.md) | [LLD.md](docs/LLD.md) | [RAG.md](docs/RAG.md) |
| |
| --- |
| |
| ## Quickstart |
| |
| ### Prerequisites |
| |
| - Docker + Docker Compose |
| - `uv` package manager |
| - One of: Gemini API key (free tier) or Groq API key (free tier) |
| |
| > **No Ollama required.** Embeddings run locally via `sentence-transformers`. |
| > First run downloads `nomic-embed-text-v1.5` (~550MB) from HuggingFace and caches it. |
| |
| ### Setup |
| |
| ```bash |
| # 1. Clone and install |
| git clone https://github.com/adeshboudh/civicsetu.git && cd civicsetu |
| make install |
| |
| # 2. Configure secrets |
| cp .env.example .env |
| # Set GEMINI_API_KEY and/or GROQ_API_KEY β everything else has working defaults |
| |
| # 3. Start infrastructure |
| make docker-up |
| |
| # 4. Ingest all 5 jurisdictions |
| make ingest |
| |
| # 5. Start the API |
| make serve |
| ``` |
| |
| **Full docs:** [HLD](docs/HLD.md) | [LLD](docs/LLD.md) |
| |
| ## Production |
| |
| - **Frontend:** [Vercel](https://civicsetu-two.vercel.app) β Next.js 15 App Router (Mobile Responsive) |
| - **API:** [Hugging Face Spaces](https://huggingface.co/spaces/adesh01/civicsetu) β FastAPI + Docker + 550MB model baked in |
| - **PostgreSQL + pgvector:** [Neon](https://neon.tech) β 1203 chunks |
| - **Neo4j:** [AuraDB Free](https://neo4j.com/cloud/aura) β 2090 sections, 2321 edges |
| - **LLM:** LiteLLM (Gemini β Groq β OpenRouter) |
| |
| ### 6. Query |
| |
| ```bash |
| curl -X POST http://localhost:8000/api/v1/query \ |
| -H "Content-Type: application/json" \ |
| -d '{"query": "What are the penalties for a promoter who delays possession?"}' |
| ``` |
| |
| > First request will be slow (~30β45s) while the embedding model loads into memory. |
| > Subsequent requests run at 5β15s. |
| |
| ### Other useful commands |
| |
| ```bash |
| make e2e # Run 12-case E2E benchmark across all 5 jurisdictions |
| make test # Run unit tests |
| make lint # Ruff linter |
| make typecheck # mypy |
| |
| # RAGAS evaluation |
| make eval-smoke-p1 # Phase 1: invoke graph for 5-row smoke dataset |
| make eval-smoke-p2 # Phase 2: score cached results with RAGAS |
| make eval-p1 # Phase 1: full 31-row golden dataset |
| make eval-p2 # Phase 2: score all 31 rows |
| make eval-reset # Clear eval caches (re-runs everything) |
| |
| make ingest --jurisdiction MAHARASHTRA # Re-ingest a single jurisdiction |
| make docker-down # Tear down containers |
| ``` |
| |
| --- |
| |
| ## Documents ingested |
| |
| | Document | Jurisdiction | Sections | |
| | ---------------------------------- | ------------- | -------- | |
| | RERA Act 2016 | Central | 224 | |
| | Maharashtra Real Estate Rules 2017 | Maharashtra | 214 | |
| | UP RERA Rules 2016 | Uttar Pradesh | 170 | |
| | UP RERA General Regulations 2019 | Uttar Pradesh | 85 | |
| | Karnataka RERA Rules 2017 | Karnataka | 235 | |
| | Tamil Nadu RERA Rules 2017 | Tamil Nadu | 157 | |
| |
| Total chunks: 1203. |
| Graph: 2090 Section nodes, 1297 HAS_SECTION edges, 933 REFERENCES edges, 91 DERIVED_FROM edges. |
| |
| |
| --- |
| |
| ## Tech stack |
| |
| | Layer | Technology | |
| | :-- | :-- | |
| | API | FastAPI + Uvicorn | |
| | Orchestration | LangGraph StateGraph | |
| | LLM routing | LiteLLM (Gemini β Groq β OpenRouter) | |
| | Embeddings | nomic-embed-text-v1.5 via sentence-transformers (local, no Ollama required) | |
| | Vector DB | pgvector + HNSW index | |
| | Graph DB | Neo4j Community | |
| | Relational | PostgreSQL + SQLAlchemy | |
| | Retrieval | Hybrid RRF: pgvector cosine + PostgreSQL FTS (websearch_to_tsquery OR-mode) | |
| | Reranker | FlashRank (rank-T5-flan) + score gap filter | |
| | Evaluation | RAGAS (faithfulness, answer relevancy, context precision) | |
| | PDF parsing | PyMuPDF | |
| |
| |
| --- |
| |
| ## Phase roadmap |
| |
| | Phase | Scope | Status | |
| | :-- | :-- | :-- | |
| | 0 | RERA Act 2016, vector RAG, FastAPI | β
Complete | |
| | 1 | Neo4j graph, cross-reference queries | β
Complete | |
| | 2 | MahaRERA Rules 2017, multi-jurisdiction | β
Complete | |
| | 3 | DERIVED_FROM edges, cross-jurisdiction graph | β
Complete | |
| | 4 | Multi-state expansion (UP, TN, Karnataka) | β
Complete | |
| | 5 | Agent pipeline hardening, E2E test suite | β
Complete | |
| | 6 | Next.js frontend, Vercel deployment, public URL | β
Complete | |
| | 7 | Graph explorer, section content drawer, D3 visualization | β
Complete | |
| | 8 | RAGAS eval pipeline, hybrid RRF retrieval, retrieval quality fixes | β
Complete | |
| | 9 | Mobile responsiveness, frontend polish, dual-pane layout, interaction animations | β
Complete | |
|
|
|
|
| --- |
|
|
| ## ADRs |
|
|
| - [RAG Techniques Reference](docs/RAG.md) β hybrid retrieval, RRF, reranking, RAGAS eval, known failure modes |
| - [ADR 001 β three store architecture](docs/adr/001-three-store-architecture.md) |
| - [ADR 002 β section boundary chunking](docs/adr/002-section-boundary-chunking.md) |
| - [ADR 003 β LangGraph over LangChain chains](docs/adr/003-langgraph-over-langchain.md) |
| - [ADR 004 β Multi-format chunker](docs/adr/004-multi-format-chunker.md) |
| - [ADR 005 β Document registry](docs/adr/005-document-registry.md) |
|
|
|
|
| ## Disclaimer |
|
|
| CivicSetu provides AI-generated legal information, not legal advice. |
| Always verify with a qualified lawyer or the official gazette. |
|
|