civicsetu / docs /LLD.md
adeshboudh16
feat(eval): RAGAS evaluation framework + RAG pipeline improvements
f8b04c3
# CivicSetu β€” Low Level Design (LLD)
**Version:** 2.0.0 β€” Phase 8 Complete (RAGAS Evaluation + Retrieval Improvements)
**Live:** https://civicsetu-two.vercel.app
**Last Updated:** April 2026
---
## 1. Module Map
```
src/civicsetu/
β”œβ”€β”€ config/
β”‚ β”œβ”€β”€ settings.py Pydantic BaseSettings singleton (lru_cache)
β”‚ └── document_registry.py All document URLs + metadata (single source of truth)
β”œβ”€β”€ models/
β”‚ β”œβ”€β”€ enums.py StrEnum: Jurisdiction, DocType, QueryType, etc.
β”‚ └── schemas.py Pydantic models: LegalChunk, Citation, RetrievedChunk, CivicSetuResponse
β”œβ”€β”€ ingestion/
β”‚ β”œβ”€β”€ downloader.py httpx PDF downloader with MD5 cache check
β”‚ β”œβ”€β”€ parser.py PyMuPDF text extractor β€” max_pages cap, scanned PDF detection
β”‚ β”œβ”€β”€ chunker.py Section-boundary regex chunker β€” 6 format patterns + fallback
β”‚ β”œβ”€β”€ metadata_extractor.py Date/Section/Rule reference/amendment regex extraction
β”‚ β”œβ”€β”€ embedder.py nomic-embed-text-v1.5 via sentence-transformers β€” truncate at 4000 chars pre-prefix
β”‚ β”œβ”€β”€ pipeline.py Orchestrates ingestion; prepends section_title to embeddings
β”‚ └── graph_seeder.py Post-ingestion REFERENCES + DERIVED_FROM edge seeding
β”œβ”€β”€ stores/
β”‚ β”œβ”€β”€ relational_store.py Async SQLAlchemy β€” documents + legal_chunks tables
β”‚ β”œβ”€β”€ vector_store.py pgvector HNSW cosine search
β”‚ └── graph_store.py Neo4j Cypher interface β€” fresh driver per call
β”œβ”€β”€ retrieval/
β”‚ β”œβ”€β”€ vector_retriever.py Wraps VectorStore for agent use
β”‚ β”œβ”€β”€ graph_retriever.py REFERENCES + DERIVED_FROM traversal, Section/Rule ID extraction
β”‚ └── reranker.py FlashRank cross-encoder wrapper
β”œβ”€β”€ agent/
β”‚ β”œβ”€β”€ state.py CivicSetuState TypedDict (frozen contract)
β”‚ β”œβ”€β”€ nodes.py Pure functions: classifier, _rrf_retrieve (shared hybrid),
β”‚ β”‚ vector_retrieval, graph_retrieval, hybrid_retrieval,
β”‚ β”‚ reranker, generator, validator
β”‚ β”œβ”€β”€ edges.py Conditional routing: route_after_classifier,
β”‚ β”‚ route_after_validator
β”‚ └── graph.py StateGraph assembly + get_compiled_graph()
β”œβ”€β”€ prompts/
β”‚ β”œβ”€β”€ classifier.py Query type classification + rewriting prompt
β”‚ β”œβ”€β”€ generator.py Cited answer generation prompt
β”‚ └── validator.py Hallucination + confidence check prompt
β”œβ”€β”€ guardrails/
β”‚ β”œβ”€β”€ input_guard.py PII detection + off-topic filter
β”‚ └── output_guard.py Faithfulness check + disclaimer injection
└── api/
β”œβ”€β”€ main.py FastAPI app factory + lifespan (graph pre-compiled)
β”œβ”€β”€ routes/
β”‚ β”œβ”€β”€ health.py GET /health β€” DB ping
β”‚ β”œβ”€β”€ query.py POST /api/v1/query β€” main RAG endpoint
β”‚ └── ingest.py POST /api/v1/ingest β€” admin endpoint
└── middleware/
└── logging.py Request/response structured logging
eval/
β”œβ”€β”€ golden_dataset.jsonl 31-row RAGAS evaluation dataset across 5 jurisdictions
scripts/
β”œβ”€β”€ run_eval.py Two-phase RAGAS evaluation: Phase 1 (graph invoke) + Phase 2 (RAGAS scoring)
frontend/ Next.js 15 App Router β€” deployed on Vercel
β”œβ”€β”€ src/app/
β”‚ β”œβ”€β”€ layout.tsx Root layout: ThemeProvider + dark mode
β”‚ β”œβ”€β”€ page.tsx Main page: wires all components together
β”‚ └── globals.css Tailwind directives + gradient utilities
β”œβ”€β”€ src/components/
β”‚ β”œβ”€β”€ Header.tsx Logo, new chat, theme toggle, GitHub link
β”‚ β”œβ”€β”€ ChatThread.tsx Scrollable message list + empty state examples
β”‚ β”œβ”€β”€ MessageBubble.tsx User/assistant/error bubbles with badges + citations
β”‚ β”œβ”€β”€ ConfidenceBadge.tsx HIGH/MEDIUM/LOW pill
β”‚ β”œβ”€β”€ CitationsPanel.tsx Collapsible citation cards
β”‚ └── InputBar.tsx Auto-resize textarea, jurisdiction select, send
β”œβ”€β”€ src/hooks/
β”‚ └── useChat.ts Chat state, session_id localStorage, sendMessage
└── src/lib/
β”œβ”€β”€ types.ts TypeScript interfaces (mirrors backend Pydantic models)
└── api.ts queryRera() fetch wrapper β†’ /api/v1/query
```
---
## 2. Database Schema
### PostgreSQL Tables
```sql
documents (
doc_id UUID PRIMARY KEY,
doc_name TEXT,
jurisdiction TEXT, -- Jurisdiction enum value
doc_type TEXT, -- DocType enum value (stored uppercase: ACT, RULES, CIRCULAR)
source_url TEXT,
effective_date DATE,
gazette_number TEXT,
total_chunks INTEGER,
ingested_at TIMESTAMPTZ,
is_active BOOLEAN
)
legal_chunks (
chunk_id UUID PRIMARY KEY,
doc_id UUID β†’ documents.doc_id,
jurisdiction TEXT,
doc_type TEXT,
doc_name TEXT,
section_id TEXT, -- "18", "3(2)", "Para-3"
section_title TEXT,
section_hierarchy TEXT[], -- ["RERA Act 2016", "18"]
text TEXT,
effective_date DATE,
superseded_by UUID β†’ legal_chunks.chunk_id,
status TEXT, -- ChunkStatus enum value
source_url TEXT,
page_number INTEGER,
embedding vector(768) -- HNSW indexed
)
```
### pgvector Index
```sql
CREATE INDEX legal_chunks_embedding_idx
ON legal_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
```
`m=16` β€” 16 connections per node. `ef_construction=64` β€” 64 candidates during index build.
Tuned for recall/speed balance at <10K vectors. Revisit at 100K+.
### Neo4j Graph Schema
```
Nodes:
(:Document {doc_id, doc_name, jurisdiction, doc_type, effective_date})
(:Section {section_id, title, chunk_id, jurisdiction, doc_name, is_active})
Edges:
(:Document)-[:HAS_SECTION]->(:Section)
(:Section) -[:REFERENCES]->(:Section) -- intra + cross-jurisdiction citations
(:Section) -[:DERIVED_FROM]->(:Section) -- State Rule N β†’ RERA Act Sec M
(:Document)-[:DERIVED_FROM]->(:Document) -- State Rules β†’ RERA Act 2016
Planned (Phase 7+):
(:Section) -[:SUPERSEDES]->(:Section)
(:Section) -[:AMENDED_BY]->(:Amendment)
(:Section) -[:CONFLICTS_WITH]->(:Section)
```
**Live graph stats (Phase 6):**
| Metric | Count |
|--------------|-------|
| Documents | 9 |
| Sections | 2090 |
| HAS_SECTION | 1297 |
| REFERENCES | 933 |
| DERIVED_FROM | 91 |
---
## 3. Document Registry
`document_registry.py` β€” single source of truth for all ingested documents.
```python
@dataclass(frozen=True)
class DocumentSpec:
name: str
url: str
jurisdiction: Jurisdiction
doc_type: DocType
effective_date: date | None
filename: str
dest_subdir: str
max_pages: int | None = None # None = all pages; cap excludes forms/schedules appendices
```
### Ingested Documents (Phase 6)
| Key | Document | Jurisdiction | DocType | Chunks | max_pages |
|---|---|---|---|---|---|
| `rera_act_2016` | RERA Act 2016 | CENTRAL | ACT | ~224 | None |
| `mahrera_rules_2017` | MahaRERA Rules 2017 | MAHARASHTRA | RULES | ~214 | None |
| `up_rera_rules_2016` | UP RERA Rules 2016 | UTTAR_PRADESH | RULES | 170 | 24 |
| `up_rera_general_regulations_2019` | UP RERA General Regulations 2019 | UTTAR_PRADESH | CIRCULAR | 85 | None |
| `karnataka_rera_rules_2017` | Karnataka RERA Rules 2017 | KARNATAKA | RULES | 235 | 37 |
| `tn_rera_rules_2017` | Tamil Nadu RERA Rules 2017 | TAMIL_NADU | RULES | 157 | 15 |
**PDF source notes:**
- Karnataka official PDF (`rera.karnataka.gov.in`) is fully scanned (19MB image) β€” NAREDCO mirror used
- TN PDF bundles rules + forms (101 pages); `max_pages=15` excludes Forms A–O
- UP Rules PDF bundles rules + forms (52 pages); `max_pages=24` excludes prescribed forms
---
## 4. LangGraph State Machine
### State Contract (`agent/state.py`)
```python
class CivicSetuState(TypedDict):
# Input
query: str
session_id: Optional[str]
jurisdiction_filter: Optional[Jurisdiction]
top_k: int
# Classification
query_type: Optional[QueryType]
rewritten_query: Optional[str]
# Retrieval β€” Annotated[list, operator.add] enables parallel node merging
retrieved_chunks: Annotated[list[RetrievedChunk], operator.add]
reranked_chunks: list[RetrievedChunk]
# Generation
raw_response: Optional[str]
citations: list[Citation]
confidence_score: float
conflict_warnings: list[str]
amendment_notice: Optional[str]
# Control
retry_count: int # max 2 retries
hallucination_flag: bool
error: Optional[str]
```
### RetrievedChunk Schema (`models/schemas.py`)
```python
class RetrievedChunk(BaseModel):
chunk: LegalChunk
vector_score: float | None = None
rerank_score: float | None = None
retrieval_source: str = "vector" # "vector" | "graph"
graph_path: Optional[str] = None # e.g. "source:18@CENTRAL"
is_pinned: bool = False # True = exact source section, bypasses reranker sort
```
### Node Responsibilities
| Node | Input Keys | Output Keys | LLM Call |
| :-- | :-- | :-- | :-- |
| classifier | query | query_type, rewritten_query | Yes |
| vector_retrieval | rewritten_query, top_k | retrieved_chunks | No |
| graph_retrieval | rewritten_query, top_k | retrieved_chunks | No |
| reranker | retrieved_chunks, query | reranked_chunks | No |
| generator | reranked_chunks, query | raw_response, citations, confidence_score | Yes |
| validator | raw_response, reranked_chunks | hallucination_flag, confidence_score | Yes |
| retry | retry_count | retry_count+1, cleared retrieval fields | No |
### Routing Logic
| classifier β†’ route_after_classifier | |
|---------------------------------------|------------------------------------------|
| fact_lookup | vector_retrieval (RRF hybrid) |
| cross_reference | graph_retrieval (β†’ RRF fallback) |
| penalty_lookup | graph_retrieval (β†’ RRF fallback) |
| temporal | graph_retrieval (β†’ RRF fallback) |
| conflict_detection | hybrid_retrieval (RRF across jur.) |
```
validator β†’ route_after_validator:
confidence >= 0.5 AND not hallucinated β†’ END
(confidence < 0.5 OR hallucinated) AND retry_count < 2 β†’ retry β†’ classifier
(confidence < 0.5 OR hallucinated) AND retry_count >= 2 β†’ END (low confidence answer)
```
---
## 5. Chunking Strategy
### Section Boundary Detection
Six regex patterns across `DocType.RULES`, tried in order (first match wins per line):
| # | Pattern | Format | Jurisdiction |
|---|---|---|---|
| 1 | `\n(?P<id>\d{1,2}[A-Z]?)\.\s*\n(?P<title>...)` | Newline-dot-newline | MahaRERA |
| 2 | `^\s*(?P<id>\d{1,2}[A-Z]?)\.\s+(?P<title>...)\.?β€”` | Same-line em-dash | MahaRERA |
| 3 | `^Rule\s+(?P<id>\d{1,2}[A-Z]?)\s*[.\-–]\s*(?P<title>...)` | Explicit Rule prefix | Generic |
| 4 | `^\s*(?P<id>\d{1,2}[A-Z]?)\.\s+(?P<title>...?)\.–` | ASCII hyphen `.-` | Karnataka, Tamil Nadu |
| 5 | `(?P<id>\d{1,2}[A-Z]?)-\(1\)\s*\n(?P<title>...)` | `N-(1)\nTitle` | UP RERA multi-clause |
| 6 | `(?P<id>\d{1,2}[A-Z]?)-(?!\()\s*\n(?P<title>...)` | `N-\nTitle` | UP RERA single-clause |
`DocType.ACT` uses a separate pattern set. Fallback: paragraph split on double newlines.
Rule IDs capped at `\d{1,2}` (max 2 digits) β€” prevents year strings like `2016` matching as rule IDs.
Logs `no_section_boundaries_found` + `fallback_paragraph_chunking` when falling back.
### Chunk Size Limits
```
MIN_CHARS = 100 β€” discard fragments (headers, page numbers)
MAX_CHARS = 1500 β€” split large sections at subsection markers (1), (2), (a), (b)
```
### Split Priority for Large Sections
```
1. Subsection markers: \n\s*\((?:\d+|[a-z]{1,3})\)\s+
2. Sentence boundary near MAX_CHARS: rfind('. ')
3. Hard cut at MAX_CHARS (last resort)
```
### parser.py β€” max_pages cap
```python
@staticmethod
def parse(source: str | Path, max_pages: int | None = None) -> ParsedDocument:
all_pages = list(doc)
if max_pages is not None:
all_pages = all_pages[:max_pages] # slice before fulltext build
```
---
## 6. Embedding Strategy
**Model:** `nomic-embed-text-v1.5` (via `sentence-transformers`, local β€” no Ollama required)
**Dimension:** 768
**Asymmetric prefixes** (MTEB/nomic-embed requirement):
```
Ingestion time: "search_document: {section_title}\n{text}" β†’ pipeline.py
Query time: "search_query: {rewritten_query}" β†’ retrieval/__init__.py
```
**Section title prepend (Phase 8 change):** `pipeline.py` prepends `section_title` to the
embedded text so sub-chunks (e.g. `S.11(2)`) retain their section context.
Without this, sub-chunks embed without "Obligations of promoter" β€” cosine similarity misses them.
The reranker still receives raw `chunk.text` (no title prefix).
Using wrong prefix at query time causes ~10–15% recall degradation.
### Truncation Guard
```python
MAX_EMBED_CHARS = 4000 # ~1000 tokens β€” safe ceiling before prefix added
def embed_document(self, text: str) -> list[float]:
if len(text) > MAX_EMBED_CHARS:
log.warning("embedding_truncated", original_len=len(text), truncated_to=MAX_EMBED_CHARS)
text = text[:MAX_EMBED_CHARS]
prefixed = f"search_document: {text.strip()}" # prefix AFTER truncation
return self.embed_one(prefixed)
```
Truncation happens **before** prefix is added β€” prevents Ollama 500 errors on Tamil Nadu
and other gazette PDFs where sub-sections exceed 10K chars.
---
## 7. Hybrid Retrieval β€” `_rrf_retrieve()`
All retrieval nodes share a single async helper `_rrf_retrieve()` in `agent/nodes.py`.
### Reciprocal Rank Fusion (RRF)
```python
RRF_K = 60 # standard constant
rrf_score(chunk) = 1/(K + rank_in_vector) + 1/(K + rank_in_fts)
```
Fetches `top_k Γ— 3` vector results and `top_k Γ— 2` FTS results, deduplicates by `chunk_id`,
merges via RRF, returns top `top_k Γ— 2`.
### Full-Text Search
`VectorStore.full_text_search()` uses `websearch_to_tsquery` in OR mode:
```sql
WHERE to_tsvector('english', text) @@ websearch_to_tsquery('english', :query)
ORDER BY ts_rank(to_tsvector('english', text), websearch_to_tsquery('english', :query)) DESC
```
Changed from `plainto_tsquery` (AND-mode) β€” AND required all query words to match,
excluding relevant sections that matched most but not all words.
### Section Family Expansion
After RRF merge, top-3 results trigger family expansion:
```python
for rc in merged[:3]:
base_sid = re.sub(r'\([^)]*\)$', '', section_id).strip() # "5(4)" β†’ "5"
family = await VectorStore.get_section_family(section_id=base_sid, jurisdiction=jur)
# returns all chunks where section_id = '5' OR section_id LIKE '5(%'
```
`get_section_family` guard: skips if `section_id` already contains `(` (base_sid computation
strips this before calling). Hard cap: `_MAX_VECTOR_EXPANDED = 40` chunks before reranker.
**Why top-3 not top-1:** If top-1 RRF result is a sub-section (`S.5(4)`), its parent
family is expanded. But if the truly relevant parent section (`S.11`) appears at RRF rank 2,
only expanding top-1 misses it. Expanding top-3 covers more cases at the cost of a slightly
larger pool.
---
## 7b. Reranker Detail
`reranker_score_threshold = 0.1` β€” minimum cross-encoder score to enter candidate pool.
`reranker_score_gap = 0.6` β€” gap filter cliff threshold.
**Gap filter:**
```python
def _apply_score_gap(chunks, gap=0.6):
for i in range(1, len(chunks)):
if chunks[i-1].rerank_score - chunks[i].rerank_score >= gap:
return chunks[:i]
return chunks
```
**Threshold history:** Originally `threshold=0.3, gap=0.35`. Gap=0.35 was too aggressive β€”
cut chunks with 0.36 score drop, leaving only 1 context for generator. Raised to 0.6 (Phase 8).
Final context: `pinned_chunks + gap_filtered[:max(0, 5 - len(pinned))]` β†’ max 5 chunks.
---
## 8. Graph Retriever
`graph_retriever.py` β€” called on `cross_reference`, `penalty_lookup`, `temporal` query types.
### Section ID Extraction
```python
section_pattern = re.compile(r'\b(?:section|sec\.?|s\.)\s*(\d+[A-Z]?)\b', re.IGNORECASE)
rule_pattern = re.compile(r'\bRule\s+(\d+[A-Z]?)\b', re.IGNORECASE)
```
### Traversal Strategy (per jurisdiction)
For each jurisdiction (`CENTRAL`, `MAHARASHTRA`, `UTTAR_PRADESH`, `KARNATAKA`, `TAMIL_NADU`):
```
1. Source section chunks β€” exact section_id match β†’ is_pinned=True
2. REFERENCES outgoing β€” sections source cites (depth=2)
3. REFERENCES incoming β€” sections that cite source
4. DERIVED_FROM outgoing β€” Act sections this Rule derives from
5. DERIVED_FROM incoming β€” Rule sections implementing this Act section
```
### Pinning Rule
Only the exact `section_id` match gets `is_pinned=True`. Sub-sections are NOT pinned.
Max pinned chunks: 2 (one per jurisdiction). Remaining 3 slots filled by reranker.
---
## 9. Response Contract
```python
CivicSetuResponse:
answer: str # plain English, cites section numbers
citations: list[Citation] # min_length=1 β€” NEVER empty
confidence_score: float # 0.0–1.0
confidence_level: str # "high"/"medium"/"low"
query_type_resolved: QueryType
conflict_warnings: list[str] # empty until Phase 7
amendment_notice: Optional[str]
disclaimer: str # always present
Citation:
section_id: str
doc_name: str
jurisdiction: Jurisdiction
effective_date: Optional[date]
source_url: str
chunk_id: UUID
```
---
## 9. Error Handling
| Scenario | Behaviour |
| :-- | :-- |
| LLM provider rate limited | LiteLLM auto-rotates to next provider |
| All LLM providers fail | `RuntimeError` β†’ FastAPI 500 |
| No chunks retrieved | `InsufficientInfoResponse` returned |
| Hallucination detected | retry (max 2x) β†’ low confidence answer |
| DB unreachable | `/health` returns `degraded`, query returns 500 |
| Scanned PDF detected | Warning logged, fallback URL used (Karnataka) |
| Section patterns not matched | Fallback paragraph chunking, warning logged |
| Neo4j event loop mismatch | Prevented β€” `_get_driver()` creates fresh driver per call |
| Embedding input too long | Truncated at 4000 chars before prefix; warning logged |
| max_pages exceeded | Parser silently caps pages; total_pages reflects capped count |
---
## 10. Neo4j Graph β€” Phase 6 State (Current)
**Nodes:** 9 Documents, 2090 Sections
**Edges:** 1297 HAS_SECTION, 933 REFERENCES, 91 DERIVED_FROM
### Documents in Graph
| Document | Jurisdiction | DocType | Chunks | Sections | DERIVED_FROM edges |
|---|---|---|---|---|---|
| RERA Act 2016 | CENTRAL | ACT | ~224 | ~224 | β€” |
| MahaRERA Rules 2017 | MAHARASHTRA | RULES | ~214 | ~214 | 17 sec + 1 doc |
| UP RERA Rules 2016 | UTTAR_PRADESH | RULES | 170 | 33 | 11 sec + 1 doc |
| UP RERA General Regs 2019 | UTTAR_PRADESH | CIRCULAR | 85 | 53 | β€” |
| Karnataka RERA Rules 2017 | KARNATAKA | RULES | 235 | 45 | 15 sec + 1 doc |
| Tamil Nadu RERA Rules 2017 | TAMIL_NADU | RULES | 157 | 36 | 15 sec + 1 doc |
### Known Open Issues (non-blocking)
| Issue | Affected | Root Cause |
|---|---|---|
| Act Β§13 missing from graph | UP rule 14, KA rule 11, TN rule 11 | RERA Act ingestion β€” Β§13 chunked under different ID |
| Act Β§66 missing from graph | KA rule 19, TN rule 19 | RERA Act ingestion β€” Β§66 not ingested |
### DERIVED_FROM Map Summary
| Jurisdiction | Mapped pairs | Resolved | Unresolved |
|---|---|---|---|
| MAHARASHTRA | 17 | 17 | 0 |
| UTTAR_PRADESH | 15 | 11 | 4 |
| KARNATAKA | 17 | 15 | 2 |
| TAMIL_NADU | 17 | 15 | 2 |
### PDF Source Decisions
| Jurisdiction | Primary URL | Issue | Resolution |
|---|---|---|---|
| CENTRAL | indiacode.nic.in | β€” | β€” |
| MAHARASHTRA | naredco.in | β€” | β€” |
| UTTAR_PRADESH | up-rera.in/pdf/rera.pdf | pages 25–52 are forms | max_pages=24 |
| KARNATAKA | naredco.in (mirror) | Official PDF fully scanned (19MB) | NAREDCO born-digital |
| TAMIL_NADU | cms.tn.gov.in | pages 16–101 are Forms A–O | max_pages=15 |
## 11. Agent Pipeline β€” Bug Fixes (2026-03-22)
Three production bugs fixed after 12-case E2E suite. All verified: 0 retries, 0
hallucinations, avg latency 7.6s.
### Fix 1 β€” `vector_store.py::get_section_family` β€” Pydantic crash on SELECT *
`SELECT *` returned `embedding` as a raw string; Pydantic `list[float]` validation
failed. Fix: explicit column projection, `embedding=None` on all returned chunks.
Matches every other `VectorStore` method.
### Fix 2 β€” `nodes.py::vector_retrieval_node` β€” Reranker blowup on section expansion
Section family expansion ran on all 5 similarity hits β†’ up to 121 chunks β†’ FlashRank
cross-encoder serial scoring β†’ 65s reranker time. Fix (Phase 5): expand top-1 hit only; hard
cap at 25 chunks before reranker.
```python
for rc in results[:1]:
...family expansion...
expanded = expanded[:25] # hard safety cap
```
**Phase 8 update:** Expanded to top-3 after RAGAS eval revealed that when a sub-section
(e.g. `S.5(4)`) ranks #1, its parent `S.5` (with the 30-day rule) was never expanded.
Cap raised to 40 to accommodate larger families.
```python
for rc in merged[:3]: # top-3 RRF results (was: top-1)
...family expansion...
expanded = expanded[:40] # was: 25
```
### Fix 3 β€” `nodes.py::validator_node` β€” False hallucination flag
Validator built context as raw `chunk.text` joined string. Generator answer cites
`"Section 11(1)"` but raw text has no section number β†’ validator scores 0.2 β†’
`hallucinated=True` β†’ spurious retry loops (7 retries across 12 tests).
Fix: mirror generator's numbered context block `[i] doc β€” section_id: title\ntext`.
Validator can now match cited section numbers to source context.
### E2E Regression Results (post-fix)
| Metric | Pre-fix | Post-fix |
| :-- | :-- | :-- |
| Avg latency | 19.6s | **7.6s** |
| Max latency | 87.1s | **13.3s** |
| Avg confidence | 0.908 | **0.958** |
| Total retries | 7 | **0** |
| Slow (>20s) | 3 | **0** |
| Low conf (<0.7) | 2 | **0** |
| Pass rate | 12/12 | **12/12** |
---
## 12. Agent Pipeline β€” RAGAS Eval Fixes (Phase 8, April 2026)
Five changes from RAGAS evaluation revealing retrieval and faithfulness failures.
### Fix 4 β€” Reranker thresholds too aggressive (`settings.py`)
Old `score_gap=0.35` cut after any 0.36 point drop β†’ only 1 chunk reached generator.
New: `score_threshold=0.1`, `score_gap=0.6`. Keeps secondary relevant chunks while still
filtering genuine noise (0.98 β†’ 0.20 drop would still cut at 0.78 gap).
### Fix 5 β€” Generator analogy instruction caused hallucination (`generator.py`)
"Use an analogy or real-world example" produced analogies ("Think of it like selling a
used car") not present in retrieved context β†’ faithfulness judge scored as hallucination.
Fix: removed analogy instruction; replaced with "using only information from the provided context".
### Fix 6 β€” Generator weak grounding for sparse contexts (`generator.py`)
Generator constructed legal conclusions from reasoning even when context lacked evidence.
Added explicit rules:
- For sparse context: say "Based on the available context: [X]" and note missing elements
- For conflict detection: only assert conflict if BOTH provisions present in context
### Fix 7 β€” CONFLICT_DETECTION tone hint implied precedence reasoning (`nodes.py`)
Tone hint said "state which jurisdiction takes precedence when context supports it" β€”
LLM interpreted "when context supports it" loosely and applied legal reasoning.
Rewritten to: "Never infer precedence from legal reasoning β€” only state precedence if
the context explicitly says so."
### Fix 8 β€” Temporal query rewrite too generic (`classifier.py`)
Query "What is the timeline for project registration?" produced rewrite "registration
timeline period" β€” FTS missed Section 5 which uses "within thirty days" and "deemed registered".
Added rewriting guidance to expand temporal queries with specific legal time-period keywords.
### RAGAS Results (Phase 8 baseline, 5-row smoke, gemma-4-31b-it judge)
| Row | Faith (before) | Faith (after) | Prec (before) | Prec (after) |
|---|---|---|---|---|
| CENTRAL-FACT-001 | 1.00 | 0.50 | 0.00 | 0.00 |
| CENTRAL-FACT-002 | 0.80 | 0.62 | 0.00 | 0.33 |
| CENTRAL-XREF-001 | 0.63 | 0.50 | 1.00 | 1.00 |
| CENTRAL-CONF-001 | 0.00 | 0.62 | 0.00 | 0.00 |
| CENTRAL-TEMP-001 | 0.67 | 1.00 | 1.00 | 0.00 |
| **Overall** | 0.618 | **0.650** | 0.400* | 0.267 |
\* Before baseline had inflated precision from duplicate chunks (non-deterministic doc_id).
After Phase 8: deterministic UUID5 chunk IDs prevent duplicates on re-ingest.