CivicSetu - RAG Techniques Reference
Version: 2.3 - Mobile Ledger + Quality Hardening
Last Updated: 2026-05-01
This document describes the retrieval-augmented generation stack currently used in CivicSetu, what is live in the app today, and where the weak spots still are.
1. Current Status Snapshot
As of 2026-05-01, CivicSetu's RAG app is at production-grade stability (v1.0.0-level), with mobile responsiveness and retrieval quality fixes live.
- Phase 9 Complete (Mobile Responsive)
- Dual-pane layout for desktop; tabbed "Digital Ledger" UI for mobile.
- Interactive Graph Explorer with section drill-down.
- Cloud Infrastructure Live
- Relational & Vector: Neon (Postgres + pgvector)
- Graph: Neo4j AuraDB
- Frontend: Vercel
- Backend API: Hugging Face Spaces
- Live app routes
POST /api/v1/query- buffered responsePOST /api/v1/query/stream- SSE token streamingPOST /api/v1/query/section-context- section-focused chat/api/v1/graph/*- graph explorer and section drill-down
- Session-aware graph
- LangGraph uses
session_idas thread key. - Each turn clears retrieval/generation fields but preserves conversation history.
- LangGraph uses
- Active retrieval routing
fact_lookup -> vector_retrievalcross_reference|penalty_lookup|temporal -> graph_retrievalconflict_detection -> hybrid_retrieval
- Streaming is now first-class
- streaming path reuses classifier, retrieval, and reranker
- answer text streams first
- citations and metadata are extracted in a second fast pass
- Latest eval artifact (0.90 Faithfulness)
eval_results.jsondated 2026-04-28faithfulness=0.900answer_relevancy=0.858context_precision=0.696pass_rate=0.581
- Knowledge Graph Scale (as of 2026-05-01)
- Documents:
6 - Sections:
2,090 - Edges:
2,321(REFERENCES, DERIVED_FROM, HAS_SECTION)
- Documents:
- Main remaining weakness
- multi-jurisdiction retrieval still weak (
MULTIrows pass only20%) - context precision for broad fact lookups needs further HNSW tuning
- multi-jurisdiction retrieval still weak (
2. System Overview
CivicSetu is a legal-domain RAG system over five Indian RERA jurisdictions plus cross-jurisdiction queries.
Core problem:
- legal text is structured around sections, rules, sub-clauses, and cross-references
- users ask imprecise natural-language questions
- answers must stay grounded and cite the right legal section
Why plain semantic RAG fails here:
- embeddings blur important legal entities
- user queries often omit exact statute wording
- conflict questions need more than one legal source
- generation models tend to fill gaps unless grounding is strict
3. Ingestion Pipeline
3.1 PDF Parsing
ingestion/parser.py uses PyMuPDF.
Important guards:
- document-level
max_pagestrims form-heavy tails - scanned PDF detection avoids unusable OCR-free sources
- metadata stores capped page count, not necessarily total PDF pages
3.2 Section Boundary Chunking
ingestion/chunker.py applies multiple regex families in priority order to detect section and rule boundaries.
Current purpose:
- preserve citation boundaries
- keep section hierarchy intact
- split oversized sections without destroying legal structure
Fallback mode is paragraph chunking on double newlines, logged as fallback_paragraph_chunking.
3.3 Deterministic Chunk IDs
chunk_id is a UUID5 over stable section identity data.
Effect:
- re-ingestion is idempotent
ON CONFLICT DO UPDATEreplaces old chunk content- same legal section does not duplicate across re-runs
3.4 Section Title Prepended to Embeddings
During embedding, section title is prepended to chunk text.
Reason:
- split sub-sections often lose the title phrase that users actually search for
- title prefix restores semantic recall for questions like "obligations of promoter"
Reranker still reads raw chunk text, not the prefixed text.
3.5 Embedding Model
Current defaults from config/settings.py:
embedding_model = nomic-embed-textembedding_dimension = 768
Query and document embeddings use asymmetric prefixes (search_query: vs search_document: ) compatible with Nomic-style retrieval.
3.6 Graph Seeding
ingestion/graph_seeder.py populates the Neo4j knowledge graph using data already persisted in PostgreSQL.
Key steps:
- Idempotent Upsert: Documents and Sections are merged into Neo4j using UUID5
chunk_id. - Relationship Extraction:
REFERENCES:MetadataExtractoridentifies section numbers in text (e.g., "under section 18"). Handles internal and cross-jurisdiction links.DERIVED_FROM: Static mapping identifies which State Rule sections derive from which Central Act sections (both at Document and Section level).
- Execution: Automatically triggered at the end of
scripts/ingest.pyor manually viascripts/seed_phase3.py.
4. Query Pipeline
4.1 Query Classification and Rewriting
agent/nodes.py::classifier_node classifies query and rewrites it for retrieval.
Output shape:
{
"query_type": "fact_lookup | cross_reference | temporal | penalty_lookup | conflict_detection",
"rewritten_query": "expanded retrieval-friendly query"
}
Current route mapping:
| Query Type | Route |
|---|---|
fact_lookup |
vector_retrieval |
| `cross_reference" | graph_retrieval |
penalty_lookup |
graph_retrieval |
temporal |
graph_retrieval |
| `conflict_detection" | hybrid_retrieval |
Classifier fallback: if JSON parse fails, default to fact_lookup with original query.
4.2 LLM Routing and Fallback Chain
All non-streaming LLM calls use _llm_call(). Streaming uses _llm_stream().
Current model chain:
THINKING tier (Generator)
1. gemini/gemini-1.5-flash
2. groq/llama-3.3-70b-versatile
3. NVIDIA NIM: z-ai/glm4.7 | minimaxai/minimax-m2.7
FAST tier (Classifier/Validator)
1. gemini/gemini-1.5-flash
Provider notes:
- NVIDIA-hosted models (Minimax, GLM) use
https://integrate.api.nvidia.com/v1 temperature=0.0for all grounding tasks- Gemini models use a temperature of
1.0if specified as such by provider requirements for certain tiers.
5. Hybrid Retrieval
Hybrid retrieval combines vector similarity and PostgreSQL full-text search, then expands section families.
5.1 Vector Similarity Search
Used to catch semantic matches when wording differs from statute text.
5.2 Full-Text Search
Used for exact legal wording, section numbers, and important terms via websearch_to_tsquery.
5.3 Reciprocal Rank Fusion
Vector and FTS results are merged with RRF so chunks that rank well in both signals rise to the top.
5.4 Section-ID-Aware Direct Lookup
If a query contains explicit section/rule numbers (e.g., "Section 18 refund"), the retriever performs a direct indexed lookup for those sections and pins them to the top of the retrieval list. This acts as a safety net when semantic search fails to rank the exact section high enough.
5.5 Central Act Supplementation
For queries filtered by a specific State Jurisdiction (e.g., Maharashtra), the retriever automatically supplements results with chunks from the Central RERA Act 2016. This is critical because state rules often omit core definitions or penalties that are defined once in the Central Act.
6. Graph-Based Retrieval
Used for section-centric questions and legal relationships.
Current behavior:
- extract section or rule IDs from query
- traverse Neo4j relationships (
REFERENCESandDERIVED_FROM) - hydrate matching sections back from Postgres
Graph retrieval is especially important for:
- explicit section lookups
- penalty questions
- central vs state derivation paths
Pinned chunks (from direct lookup or graph traversal) stay ahead of reranked chunks.
7. Reranking
7.1 Cross-Encoder
retrieval/reranker.py uses FlashRank (ms-marco-MiniLM-L-12-v2).
Pipeline:
- deduplicate by
(section_id, doc_name) - split pinned vs rankable chunks
- rerank rankable chunks with cross-encoder
- filter by minimum score (0.05)
- apply score-gap cutoff (0.95)
- prepend pinned chunks
7.2 Context Assembly
Max context size is 7 chunks. Pinned chunks (exact matches) are never discarded by the reranker unless the context is fully saturated.
8. Generation
8.1 Buffered Generation
generator_node() builds a numbered context block and asks for JSON output.
8.2 Streaming Generation
stream_generator_node() now drives SSE output.
- Run classification/retrieval/reranking.
- Stream answer tokens immediately.
- Run a second fast metadata extraction prompt
- Push metadata/citations as the final SSE event.
8.3 Tone Hints by Query Type
| Type | Tone Guidance |
|---|---|
fact_lookup |
Direct, no metaphors, cite per bullet. |
penalty_lookup |
Lead with consequence/penalty. |
cross_reference |
Explain primary section, then connections. |
conflict_detection |
Flag contradiction ONLY if both sides are in context. |
temporal |
Lead with exact numeric deadline/time. |
9. Validation
9.1 Validator Design
validator_node() treats confidence_score < 0.2 as a hallucination risk.
- Returns
hallucination_flag: Trueif score is below floor. - Graph triggers a retry (up to 2 times) with different retrieval parameters if flagged.
9.2 Output Guardrails
guardrails/output_guard.py:
- Intercepts low-confidence or safe-guard failures.
- Returns
InsufficientInfoResponsewhen grounding is weak. - Appends legal disclaimer.
10. RAGAS Evaluation Pipeline
10.1 Two-Phase Architecture
- Phase 1: Graph invocation ->
eval_phase1_results.json. - Phase 2: RAGAS scoring ->
eval_results.json.
10.2 Dataset & Metrics
- Rows: 31 (Central, 4 States, Multi-Jurisdiction).
- Primary Metrics: Faithfulness, Answer Relevancy, Context Precision.
- Goal: Faithfulness > 0.85; Answer Relevancy > 0.80.
11. Known Failure Modes
- Multi-Jurisdiction Retrieval: Reranker often prefers one jurisdiction's terminology, leading to unbalanced context for comparison queries.
- Large Context Noise: 7 chunks sometimes include irrelevant sub-clauses that distract the generator.
12. Implementation Checklist
- Add
DocumentSpecto registry. - Verify PDF text extraction.
- Run
make ingest. - Seed Neo4j graph.
- Run
make eval-smoketo verify precision.