Spaces:

adesh01
/

civicsetu

Running

App Files Files Community

civicsetu / docs /LLD.md

adeshboudh16

feat(eval): RAGAS evaluation framework + RAG pipeline improvements

f8b04c3 8 days ago

preview code

raw

history blame contribute delete

25.2 kB

	# CivicSetu — Low Level Design (LLD)

	Version: 2.0.0 — Phase 8 Complete (RAGAS Evaluation + Retrieval Improvements)
	Live: https://civicsetu-two.vercel.app
	Last Updated: April 2026

	---

	## 1. Module Map

	```
	src/civicsetu/
	├── config/
	│ ├── settings.py Pydantic BaseSettings singleton (lru_cache)
	│ └── document_registry.py All document URLs + metadata (single source of truth)
	├── models/
	│ ├── enums.py StrEnum: Jurisdiction, DocType, QueryType, etc.
	│ └── schemas.py Pydantic models: LegalChunk, Citation, RetrievedChunk, CivicSetuResponse
	├── ingestion/
	│ ├── downloader.py httpx PDF downloader with MD5 cache check
	│ ├── parser.py PyMuPDF text extractor — max_pages cap, scanned PDF detection
	│ ├── chunker.py Section-boundary regex chunker — 6 format patterns + fallback
	│ ├── metadata_extractor.py Date/Section/Rule reference/amendment regex extraction
	│ ├── embedder.py nomic-embed-text-v1.5 via sentence-transformers — truncate at 4000 chars pre-prefix
	│ ├── pipeline.py Orchestrates ingestion; prepends section_title to embeddings
	│ └── graph_seeder.py Post-ingestion REFERENCES + DERIVED_FROM edge seeding
	├── stores/
	│ ├── relational_store.py Async SQLAlchemy — documents + legal_chunks tables
	│ ├── vector_store.py pgvector HNSW cosine search
	│ └── graph_store.py Neo4j Cypher interface — fresh driver per call
	├── retrieval/
	│ ├── vector_retriever.py Wraps VectorStore for agent use
	│ ├── graph_retriever.py REFERENCES + DERIVED_FROM traversal, Section/Rule ID extraction
	│ └── reranker.py FlashRank cross-encoder wrapper
	├── agent/
	│ ├── state.py CivicSetuState TypedDict (frozen contract)
	│ ├── nodes.py Pure functions: classifier, _rrf_retrieve (shared hybrid),
	│ │ vector_retrieval, graph_retrieval, hybrid_retrieval,
	│ │ reranker, generator, validator
	│ ├── edges.py Conditional routing: route_after_classifier,
	│ │ route_after_validator
	│ └── graph.py StateGraph assembly + get_compiled_graph()
	├── prompts/
	│ ├── classifier.py Query type classification + rewriting prompt
	│ ├── generator.py Cited answer generation prompt
	│ └── validator.py Hallucination + confidence check prompt
	├── guardrails/
	│ ├── input_guard.py PII detection + off-topic filter
	│ └── output_guard.py Faithfulness check + disclaimer injection
	└── api/
	├── main.py FastAPI app factory + lifespan (graph pre-compiled)
	├── routes/
	│ ├── health.py GET /health — DB ping
	│ ├── query.py POST /api/v1/query — main RAG endpoint
	│ └── ingest.py POST /api/v1/ingest — admin endpoint
	└── middleware/
	└── logging.py Request/response structured logging

	eval/
	├── golden_dataset.jsonl 31-row RAGAS evaluation dataset across 5 jurisdictions
	scripts/
	├── run_eval.py Two-phase RAGAS evaluation: Phase 1 (graph invoke) + Phase 2 (RAGAS scoring)

	frontend/ Next.js 15 App Router — deployed on Vercel
	├── src/app/
	│ ├── layout.tsx Root layout: ThemeProvider + dark mode
	│ ├── page.tsx Main page: wires all components together
	│ └── globals.css Tailwind directives + gradient utilities
	├── src/components/
	│ ├── Header.tsx Logo, new chat, theme toggle, GitHub link
	│ ├── ChatThread.tsx Scrollable message list + empty state examples
	│ ├── MessageBubble.tsx User/assistant/error bubbles with badges + citations
	│ ├── ConfidenceBadge.tsx HIGH/MEDIUM/LOW pill
	│ ├── CitationsPanel.tsx Collapsible citation cards
	│ └── InputBar.tsx Auto-resize textarea, jurisdiction select, send
	├── src/hooks/
	│ └── useChat.ts Chat state, session_id localStorage, sendMessage
	└── src/lib/
	├── types.ts TypeScript interfaces (mirrors backend Pydantic models)
	└── api.ts queryRera() fetch wrapper → /api/v1/query
	```

	---

	## 2. Database Schema

	### PostgreSQL Tables

	```sql
	documents (
	doc_id UUID PRIMARY KEY,
	doc_name TEXT,
	jurisdiction TEXT, -- Jurisdiction enum value
	doc_type TEXT, -- DocType enum value (stored uppercase: ACT, RULES, CIRCULAR)
	source_url TEXT,
	effective_date DATE,
	gazette_number TEXT,
	total_chunks INTEGER,
	ingested_at TIMESTAMPTZ,
	is_active BOOLEAN
	)

	legal_chunks (
	chunk_id UUID PRIMARY KEY,
	doc_id UUID → documents.doc_id,
	jurisdiction TEXT,
	doc_type TEXT,
	doc_name TEXT,
	section_id TEXT, -- "18", "3(2)", "Para-3"
	section_title TEXT,
	section_hierarchy TEXT[], -- ["RERA Act 2016", "18"]
	text TEXT,
	effective_date DATE,
	superseded_by UUID → legal_chunks.chunk_id,
	status TEXT, -- ChunkStatus enum value
	source_url TEXT,
	page_number INTEGER,
	embedding vector(768) -- HNSW indexed
	)
	```

	### pgvector Index

	```sql
	CREATE INDEX legal_chunks_embedding_idx
	ON legal_chunks
	USING hnsw (embedding vector_cosine_ops)
	WITH (m = 16, ef_construction = 64);
	```

	`m=16` — 16 connections per node. `ef_construction=64` — 64 candidates during index build.
	Tuned for recall/speed balance at <10K vectors. Revisit at 100K+.

	### Neo4j Graph Schema

	```
	Nodes:
	(:Document {doc_id, doc_name, jurisdiction, doc_type, effective_date})
	(:Section {section_id, title, chunk_id, jurisdiction, doc_name, is_active})

	Edges:
	(:Document)-[:HAS_SECTION]->(:Section)
	(:Section) -[:REFERENCES]->(:Section) -- intra + cross-jurisdiction citations
	(:Section) -[:DERIVED_FROM]->(:Section) -- State Rule N → RERA Act Sec M
	(:Document)-[:DERIVED_FROM]->(:Document) -- State Rules → RERA Act 2016

	Planned (Phase 7+):
	(:Section) -[:SUPERSEDES]->(:Section)
	(:Section) -[:AMENDED_BY]->(:Amendment)
	(:Section) -[:CONFLICTS_WITH]->(:Section)
	```

	Live graph stats (Phase 6):

	\| Metric \| Count \|
	\|--------------\|-------\|
	\| Documents \| 9 \|
	\| Sections \| 2090 \|
	\| HAS_SECTION \| 1297 \|
	\| REFERENCES \| 933 \|
	\| DERIVED_FROM \| 91 \|

	---

	## 3. Document Registry

	`document_registry.py` — single source of truth for all ingested documents.

	```python
	@dataclass(frozen=True)
	class DocumentSpec:
	name: str
	url: str
	jurisdiction: Jurisdiction
	doc_type: DocType
	effective_date: date \| None
	filename: str
	dest_subdir: str
	max_pages: int \| None = None # None = all pages; cap excludes forms/schedules appendices
	```

	### Ingested Documents (Phase 6)

	\| Key \| Document \| Jurisdiction \| DocType \| Chunks \| max_pages \|
	\|---\|---\|---\|---\|---\|---\|
	\| `rera_act_2016` \| RERA Act 2016 \| CENTRAL \| ACT \| ~224 \| None \|
	\| `mahrera_rules_2017` \| MahaRERA Rules 2017 \| MAHARASHTRA \| RULES \| ~214 \| None \|
	\| `up_rera_rules_2016` \| UP RERA Rules 2016 \| UTTAR_PRADESH \| RULES \| 170 \| 24 \|
	\| `up_rera_general_regulations_2019` \| UP RERA General Regulations 2019 \| UTTAR_PRADESH \| CIRCULAR \| 85 \| None \|
	\| `karnataka_rera_rules_2017` \| Karnataka RERA Rules 2017 \| KARNATAKA \| RULES \| 235 \| 37 \|
	\| `tn_rera_rules_2017` \| Tamil Nadu RERA Rules 2017 \| TAMIL_NADU \| RULES \| 157 \| 15 \|

	PDF source notes:
	- Karnataka official PDF (`rera.karnataka.gov.in`) is fully scanned (19MB image) — NAREDCO mirror used
	- TN PDF bundles rules + forms (101 pages); `max_pages=15` excludes Forms A–O
	- UP Rules PDF bundles rules + forms (52 pages); `max_pages=24` excludes prescribed forms

	---

	## 4. LangGraph State Machine

	### State Contract (`agent/state.py`)

	```python
	class CivicSetuState(TypedDict):
	# Input
	query: str
	session_id: Optional[str]
	jurisdiction_filter: Optional[Jurisdiction]
	top_k: int

	# Classification
	query_type: Optional[QueryType]
	rewritten_query: Optional[str]

	# Retrieval — Annotated[list, operator.add] enables parallel node merging
	retrieved_chunks: Annotated[list[RetrievedChunk], operator.add]
	reranked_chunks: list[RetrievedChunk]

	# Generation
	raw_response: Optional[str]
	citations: list[Citation]
	confidence_score: float
	conflict_warnings: list[str]
	amendment_notice: Optional[str]

	# Control
	retry_count: int # max 2 retries
	hallucination_flag: bool
	error: Optional[str]
	```

	### RetrievedChunk Schema (`models/schemas.py`)

	```python
	class RetrievedChunk(BaseModel):
	chunk: LegalChunk
	vector_score: float \| None = None
	rerank_score: float \| None = None
	retrieval_source: str = "vector" # "vector" \| "graph"
	graph_path: Optional[str] = None # e.g. "source:18@CENTRAL"
	is_pinned: bool = False # True = exact source section, bypasses reranker sort
	```

	### Node Responsibilities

	\| Node \| Input Keys \| Output Keys \| LLM Call \|
	\| :-- \| :-- \| :-- \| :-- \|
	\| classifier \| query \| query_type, rewritten_query \| Yes \|
	\| vector_retrieval \| rewritten_query, top_k \| retrieved_chunks \| No \|
	\| graph_retrieval \| rewritten_query, top_k \| retrieved_chunks \| No \|
	\| reranker \| retrieved_chunks, query \| reranked_chunks \| No \|
	\| generator \| reranked_chunks, query \| raw_response, citations, confidence_score \| Yes \|
	\| validator \| raw_response, reranked_chunks \| hallucination_flag, confidence_score \| Yes \|
	\| retry \| retry_count \| retry_count+1, cleared retrieval fields \| No \|

	### Routing Logic

	\| classifier → route_after_classifier \| \|
	\|---------------------------------------\|------------------------------------------\|
	\| fact_lookup \| vector_retrieval (RRF hybrid) \|
	\| cross_reference \| graph_retrieval (→ RRF fallback) \|
	\| penalty_lookup \| graph_retrieval (→ RRF fallback) \|
	\| temporal \| graph_retrieval (→ RRF fallback) \|
	\| conflict_detection \| hybrid_retrieval (RRF across jur.) \|

	```
	validator → route_after_validator:
	confidence >= 0.5 AND not hallucinated → END
	(confidence < 0.5 OR hallucinated) AND retry_count < 2 → retry → classifier
	(confidence < 0.5 OR hallucinated) AND retry_count >= 2 → END (low confidence answer)
	```

	---

	## 5. Chunking Strategy

	### Section Boundary Detection

	Six regex patterns across `DocType.RULES`, tried in order (first match wins per line):

	\| # \| Pattern \| Format \| Jurisdiction \|
	\|---\|---\|---\|---\|
	\| 1 \| `\n(?P<id>\d{1,2}[A-Z]?)\.\s*\n(?P<title>...)` \| Newline-dot-newline \| MahaRERA \|
	\| 2 \| `^\s*(?P<id>\d{1,2}[A-Z]?)\.\s+(?P<title>...)\.?—` \| Same-line em-dash \| MahaRERA \|
	\| 3 \| `^Rule\s+(?P<id>\d{1,2}[A-Z]?)\s[.\-–]\s(?P<title>...)` \| Explicit Rule prefix \| Generic \|
	\| 4 \| `^\s*(?P<id>\d{1,2}[A-Z]?)\.\s+(?P<title>...?)\.–` \| ASCII hyphen `.-` \| Karnataka, Tamil Nadu \|
	\| 5 \| `(?P<id>\d{1,2}[A-Z]?)-$1$\s*\n(?P<title>...)` \| `N-(1)\nTitle` \| UP RERA multi-clause \|
	\| 6 \| `(?P<id>\d{1,2}[A-Z]?)-(?!\()\s*\n(?P<title>...)` \| `N-\nTitle` \| UP RERA single-clause \|

	`DocType.ACT` uses a separate pattern set. Fallback: paragraph split on double newlines.
	Rule IDs capped at `\d{1,2}` (max 2 digits) — prevents year strings like `2016` matching as rule IDs.
	Logs `no_section_boundaries_found` + `fallback_paragraph_chunking` when falling back.

	### Chunk Size Limits

	```
	MIN_CHARS = 100 — discard fragments (headers, page numbers)
	MAX_CHARS = 1500 — split large sections at subsection markers (1), (2), (a), (b)
	```

	### Split Priority for Large Sections

	```
	1. Subsection markers: \n\s*$(?:\d+\|[a-z]{1,3})$\s+
	2. Sentence boundary near MAX_CHARS: rfind('. ')
	3. Hard cut at MAX_CHARS (last resort)
	```

	### parser.py — max_pages cap

	```python
	@staticmethod
	def parse(source: str \| Path, max_pages: int \| None = None) -> ParsedDocument:
	all_pages = list(doc)
	if max_pages is not None:
	all_pages = all_pages[:max_pages] # slice before fulltext build
	```

	---

	## 6. Embedding Strategy

	Model: `nomic-embed-text-v1.5` (via `sentence-transformers`, local — no Ollama required)
	Dimension: 768
	Asymmetric prefixes (MTEB/nomic-embed requirement):

	```
	Ingestion time: "search_document: {section_title}\n{text}" → pipeline.py
	Query time: "search_query: {rewritten_query}" → retrieval/__init__.py
	```

	Section title prepend (Phase 8 change): `pipeline.py` prepends `section_title` to the
	embedded text so sub-chunks (e.g. `S.11(2)`) retain their section context.
	Without this, sub-chunks embed without "Obligations of promoter" — cosine similarity misses them.
	The reranker still receives raw `chunk.text` (no title prefix).

	Using wrong prefix at query time causes ~10–15% recall degradation.

	### Truncation Guard

	```python
	MAX_EMBED_CHARS = 4000 # ~1000 tokens — safe ceiling before prefix added

	def embed_document(self, text: str) -> list[float]:
	if len(text) > MAX_EMBED_CHARS:
	log.warning("embedding_truncated", original_len=len(text), truncated_to=MAX_EMBED_CHARS)
	text = text[:MAX_EMBED_CHARS]
	prefixed = f"search_document: {text.strip()}" # prefix AFTER truncation
	return self.embed_one(prefixed)
	```

	Truncation happens before prefix is added — prevents Ollama 500 errors on Tamil Nadu
	and other gazette PDFs where sub-sections exceed 10K chars.

	---

	## 7. Hybrid Retrieval — `_rrf_retrieve()`

	All retrieval nodes share a single async helper `_rrf_retrieve()` in `agent/nodes.py`.

	### Reciprocal Rank Fusion (RRF)

	```python
	RRF_K = 60 # standard constant

	rrf_score(chunk) = 1/(K + rank_in_vector) + 1/(K + rank_in_fts)
	```

	Fetches `top_k × 3` vector results and `top_k × 2` FTS results, deduplicates by `chunk_id`,
	merges via RRF, returns top `top_k × 2`.

	### Full-Text Search

	`VectorStore.full_text_search()` uses `websearch_to_tsquery` in OR mode:

	```sql
	WHERE to_tsvector('english', text) @@ websearch_to_tsquery('english', :query)
	ORDER BY ts_rank(to_tsvector('english', text), websearch_to_tsquery('english', :query)) DESC
	```

	Changed from `plainto_tsquery` (AND-mode) — AND required all query words to match,
	excluding relevant sections that matched most but not all words.

	### Section Family Expansion

	After RRF merge, top-3 results trigger family expansion:

	```python
	for rc in merged[:3]:
	base_sid = re.sub(r'$[^)]*$$', '', section_id).strip() # "5(4)" → "5"
	family = await VectorStore.get_section_family(section_id=base_sid, jurisdiction=jur)
	# returns all chunks where section_id = '5' OR section_id LIKE '5(%'
	```

	`get_section_family` guard: skips if `section_id` already contains `(` (base_sid computation
	strips this before calling). Hard cap: `_MAX_VECTOR_EXPANDED = 40` chunks before reranker.

	Why top-3 not top-1: If top-1 RRF result is a sub-section (`S.5(4)`), its parent
	family is expanded. But if the truly relevant parent section (`S.11`) appears at RRF rank 2,
	only expanding top-1 misses it. Expanding top-3 covers more cases at the cost of a slightly
	larger pool.

	---

	## 7b. Reranker Detail

	`reranker_score_threshold = 0.1` — minimum cross-encoder score to enter candidate pool.
	`reranker_score_gap = 0.6` — gap filter cliff threshold.

	Gap filter:

	```python
	def _apply_score_gap(chunks, gap=0.6):
	for i in range(1, len(chunks)):
	if chunks[i-1].rerank_score - chunks[i].rerank_score >= gap:
	return chunks[:i]
	return chunks
	```

	Threshold history: Originally `threshold=0.3, gap=0.35`. Gap=0.35 was too aggressive —
	cut chunks with 0.36 score drop, leaving only 1 context for generator. Raised to 0.6 (Phase 8).

	Final context: `pinned_chunks + gap_filtered[:max(0, 5 - len(pinned))]` → max 5 chunks.

	---

	## 8. Graph Retriever

	`graph_retriever.py` — called on `cross_reference`, `penalty_lookup`, `temporal` query types.

	### Section ID Extraction

	```python
	section_pattern = re.compile(r'\b(?:section\|sec\.?\|s\.)\s*(\d+[A-Z]?)\b', re.IGNORECASE)
	rule_pattern = re.compile(r'\bRule\s+(\d+[A-Z]?)\b', re.IGNORECASE)
	```

	### Traversal Strategy (per jurisdiction)

	For each jurisdiction (`CENTRAL`, `MAHARASHTRA`, `UTTAR_PRADESH`, `KARNATAKA`, `TAMIL_NADU`):

	```
	1. Source section chunks — exact section_id match → is_pinned=True
	2. REFERENCES outgoing — sections source cites (depth=2)
	3. REFERENCES incoming — sections that cite source
	4. DERIVED_FROM outgoing — Act sections this Rule derives from
	5. DERIVED_FROM incoming — Rule sections implementing this Act section
	```

	### Pinning Rule

	Only the exact `section_id` match gets `is_pinned=True`. Sub-sections are NOT pinned.
	Max pinned chunks: 2 (one per jurisdiction). Remaining 3 slots filled by reranker.

	---

	## 9. Response Contract

	```python
	CivicSetuResponse:
	answer: str # plain English, cites section numbers
	citations: list[Citation] # min_length=1 — NEVER empty
	confidence_score: float # 0.0–1.0
	confidence_level: str # "high"/"medium"/"low"
	query_type_resolved: QueryType
	conflict_warnings: list[str] # empty until Phase 7
	amendment_notice: Optional[str]
	disclaimer: str # always present

	Citation:
	section_id: str
	doc_name: str
	jurisdiction: Jurisdiction
	effective_date: Optional[date]
	source_url: str
	chunk_id: UUID
	```

	---

	## 9. Error Handling

	\| Scenario \| Behaviour \|
	\| :-- \| :-- \|
	\| LLM provider rate limited \| LiteLLM auto-rotates to next provider \|
	\| All LLM providers fail \| `RuntimeError` → FastAPI 500 \|
	\| No chunks retrieved \| `InsufficientInfoResponse` returned \|
	\| Hallucination detected \| retry (max 2x) → low confidence answer \|
	\| DB unreachable \| `/health` returns `degraded`, query returns 500 \|
	\| Scanned PDF detected \| Warning logged, fallback URL used (Karnataka) \|
	\| Section patterns not matched \| Fallback paragraph chunking, warning logged \|
	\| Neo4j event loop mismatch \| Prevented — `_get_driver()` creates fresh driver per call \|
	\| Embedding input too long \| Truncated at 4000 chars before prefix; warning logged \|
	\| max_pages exceeded \| Parser silently caps pages; total_pages reflects capped count \|

	---

	## 10. Neo4j Graph — Phase 6 State (Current)

	Nodes: 9 Documents, 2090 Sections
	Edges: 1297 HAS_SECTION, 933 REFERENCES, 91 DERIVED_FROM

	### Documents in Graph

	\| Document \| Jurisdiction \| DocType \| Chunks \| Sections \| DERIVED_FROM edges \|
	\|---\|---\|---\|---\|---\|---\|
	\| RERA Act 2016 \| CENTRAL \| ACT \| ~224 \| ~224 \| — \|
	\| MahaRERA Rules 2017 \| MAHARASHTRA \| RULES \| ~214 \| ~214 \| 17 sec + 1 doc \|
	\| UP RERA Rules 2016 \| UTTAR_PRADESH \| RULES \| 170 \| 33 \| 11 sec + 1 doc \|
	\| UP RERA General Regs 2019 \| UTTAR_PRADESH \| CIRCULAR \| 85 \| 53 \| — \|
	\| Karnataka RERA Rules 2017 \| KARNATAKA \| RULES \| 235 \| 45 \| 15 sec + 1 doc \|
	\| Tamil Nadu RERA Rules 2017 \| TAMIL_NADU \| RULES \| 157 \| 36 \| 15 sec + 1 doc \|

	### Known Open Issues (non-blocking)

	\| Issue \| Affected \| Root Cause \|
	\|---\|---\|---\|
	\| Act §13 missing from graph \| UP rule 14, KA rule 11, TN rule 11 \| RERA Act ingestion — §13 chunked under different ID \|
	\| Act §66 missing from graph \| KA rule 19, TN rule 19 \| RERA Act ingestion — §66 not ingested \|

	### DERIVED_FROM Map Summary

	\| Jurisdiction \| Mapped pairs \| Resolved \| Unresolved \|
	\|---\|---\|---\|---\|
	\| MAHARASHTRA \| 17 \| 17 \| 0 \|
	\| UTTAR_PRADESH \| 15 \| 11 \| 4 \|
	\| KARNATAKA \| 17 \| 15 \| 2 \|
	\| TAMIL_NADU \| 17 \| 15 \| 2 \|

	### PDF Source Decisions

	\| Jurisdiction \| Primary URL \| Issue \| Resolution \|
	\|---\|---\|---\|---\|
	\| CENTRAL \| indiacode.nic.in \| — \| — \|
	\| MAHARASHTRA \| naredco.in \| — \| — \|
	\| UTTAR_PRADESH \| up-rera.in/pdf/rera.pdf \| pages 25–52 are forms \| max_pages=24 \|
	\| KARNATAKA \| naredco.in (mirror) \| Official PDF fully scanned (19MB) \| NAREDCO born-digital \|
	\| TAMIL_NADU \| cms.tn.gov.in \| pages 16–101 are Forms A–O \| max_pages=15 \|

	## 11. Agent Pipeline — Bug Fixes (2026-03-22)

	Three production bugs fixed after 12-case E2E suite. All verified: 0 retries, 0
	hallucinations, avg latency 7.6s.

	### Fix 1 — `vector_store.py::get_section_family` — Pydantic crash on SELECT *

	`SELECT *` returned `embedding` as a raw string; Pydantic `list[float]` validation
	failed. Fix: explicit column projection, `embedding=None` on all returned chunks.
	Matches every other `VectorStore` method.

	### Fix 2 — `nodes.py::vector_retrieval_node` — Reranker blowup on section expansion

	Section family expansion ran on all 5 similarity hits → up to 121 chunks → FlashRank
	cross-encoder serial scoring → 65s reranker time. Fix (Phase 5): expand top-1 hit only; hard
	cap at 25 chunks before reranker.

	```python
	for rc in results[:1]:
	...family expansion...
	expanded = expanded[:25] # hard safety cap
	```

	Phase 8 update: Expanded to top-3 after RAGAS eval revealed that when a sub-section
	(e.g. `S.5(4)`) ranks #1, its parent `S.5` (with the 30-day rule) was never expanded.
	Cap raised to 40 to accommodate larger families.

	```python
	for rc in merged[:3]: # top-3 RRF results (was: top-1)
	...family expansion...
	expanded = expanded[:40] # was: 25
	```


	### Fix 3 — `nodes.py::validator_node` — False hallucination flag

	Validator built context as raw `chunk.text` joined string. Generator answer cites
	`"Section 11(1)"` but raw text has no section number → validator scores 0.2 →
	`hallucinated=True` → spurious retry loops (7 retries across 12 tests).

	Fix: mirror generator's numbered context block `[i] doc — section_id: title\ntext`.
	Validator can now match cited section numbers to source context.

	### E2E Regression Results (post-fix)

	\| Metric \| Pre-fix \| Post-fix \|
	\| :-- \| :-- \| :-- \|
	\| Avg latency \| 19.6s \| 7.6s \|
	\| Max latency \| 87.1s \| 13.3s \|
	\| Avg confidence \| 0.908 \| 0.958 \|
	\| Total retries \| 7 \| 0 \|
	\| Slow (>20s) \| 3 \| 0 \|
	\| Low conf (<0.7) \| 2 \| 0 \|
	\| Pass rate \| 12/12 \| 12/12 \|

	---

	## 12. Agent Pipeline — RAGAS Eval Fixes (Phase 8, April 2026)

	Five changes from RAGAS evaluation revealing retrieval and faithfulness failures.

	### Fix 4 — Reranker thresholds too aggressive (`settings.py`)

	Old `score_gap=0.35` cut after any 0.36 point drop → only 1 chunk reached generator.
	New: `score_threshold=0.1`, `score_gap=0.6`. Keeps secondary relevant chunks while still
	filtering genuine noise (0.98 → 0.20 drop would still cut at 0.78 gap).

	### Fix 5 — Generator analogy instruction caused hallucination (`generator.py`)

	"Use an analogy or real-world example" produced analogies ("Think of it like selling a
	used car") not present in retrieved context → faithfulness judge scored as hallucination.
	Fix: removed analogy instruction; replaced with "using only information from the provided context".

	### Fix 6 — Generator weak grounding for sparse contexts (`generator.py`)

	Generator constructed legal conclusions from reasoning even when context lacked evidence.
	Added explicit rules:
	- For sparse context: say "Based on the available context: [X]" and note missing elements
	- For conflict detection: only assert conflict if BOTH provisions present in context

	### Fix 7 — CONFLICT_DETECTION tone hint implied precedence reasoning (`nodes.py`)

	Tone hint said "state which jurisdiction takes precedence when context supports it" —
	LLM interpreted "when context supports it" loosely and applied legal reasoning.
	Rewritten to: "Never infer precedence from legal reasoning — only state precedence if
	the context explicitly says so."

	### Fix 8 — Temporal query rewrite too generic (`classifier.py`)

	Query "What is the timeline for project registration?" produced rewrite "registration
	timeline period" — FTS missed Section 5 which uses "within thirty days" and "deemed registered".
	Added rewriting guidance to expand temporal queries with specific legal time-period keywords.

	### RAGAS Results (Phase 8 baseline, 5-row smoke, gemma-4-31b-it judge)

	\| Row \| Faith (before) \| Faith (after) \| Prec (before) \| Prec (after) \|
	\|---\|---\|---\|---\|---\|
	\| CENTRAL-FACT-001 \| 1.00 \| 0.50 \| 0.00 \| 0.00 \|
	\| CENTRAL-FACT-002 \| 0.80 \| 0.62 \| 0.00 \| 0.33 \|
	\| CENTRAL-XREF-001 \| 0.63 \| 0.50 \| 1.00 \| 1.00 \|
	\| CENTRAL-CONF-001 \| 0.00 \| 0.62 \| 0.00 \| 0.00 \|
	\| CENTRAL-TEMP-001 \| 0.67 \| 1.00 \| 1.00 \| 0.00 \|
	\| Overall \| 0.618 \| 0.650 \| 0.400* \| 0.267 \|

	\* Before baseline had inflated precision from duplicate chunks (non-deterministic doc_id).
	After Phase 8: deterministic UUID5 chunk IDs prevent duplicates on re-ingest.