civicsetu / docs /LLD.md
adeshboudh16
feat(eval): RAGAS evaluation framework + RAG pipeline improvements
f8b04c3

CivicSetu β€” Low Level Design (LLD)

Version: 2.0.0 β€” Phase 8 Complete (RAGAS Evaluation + Retrieval Improvements) Live: https://civicsetu-two.vercel.app Last Updated: April 2026


1. Module Map

src/civicsetu/
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ settings.py           Pydantic BaseSettings singleton (lru_cache)
β”‚   └── document_registry.py  All document URLs + metadata (single source of truth)
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ enums.py              StrEnum: Jurisdiction, DocType, QueryType, etc.
β”‚   └── schemas.py            Pydantic models: LegalChunk, Citation, RetrievedChunk, CivicSetuResponse
β”œβ”€β”€ ingestion/
β”‚   β”œβ”€β”€ downloader.py         httpx PDF downloader with MD5 cache check
β”‚   β”œβ”€β”€ parser.py             PyMuPDF text extractor β€” max_pages cap, scanned PDF detection
β”‚   β”œβ”€β”€ chunker.py            Section-boundary regex chunker β€” 6 format patterns + fallback
β”‚   β”œβ”€β”€ metadata_extractor.py Date/Section/Rule reference/amendment regex extraction
β”‚   β”œβ”€β”€ embedder.py           nomic-embed-text-v1.5 via sentence-transformers β€” truncate at 4000 chars pre-prefix
β”‚   β”œβ”€β”€ pipeline.py           Orchestrates ingestion; prepends section_title to embeddings
β”‚   └── graph_seeder.py       Post-ingestion REFERENCES + DERIVED_FROM edge seeding
β”œβ”€β”€ stores/
β”‚   β”œβ”€β”€ relational_store.py   Async SQLAlchemy β€” documents + legal_chunks tables
β”‚   β”œβ”€β”€ vector_store.py       pgvector HNSW cosine search
β”‚   └── graph_store.py        Neo4j Cypher interface β€” fresh driver per call
β”œβ”€β”€ retrieval/
β”‚   β”œβ”€β”€ vector_retriever.py   Wraps VectorStore for agent use
β”‚   β”œβ”€β”€ graph_retriever.py    REFERENCES + DERIVED_FROM traversal, Section/Rule ID extraction
β”‚   └── reranker.py           FlashRank cross-encoder wrapper
β”œβ”€β”€ agent/
β”‚   β”œβ”€β”€ state.py              CivicSetuState TypedDict (frozen contract)
β”‚   β”œβ”€β”€ nodes.py              Pure functions: classifier, _rrf_retrieve (shared hybrid),
β”‚   β”‚                         vector_retrieval, graph_retrieval, hybrid_retrieval,
β”‚   β”‚                         reranker, generator, validator
β”‚   β”œβ”€β”€ edges.py              Conditional routing: route_after_classifier,
β”‚   β”‚                         route_after_validator
β”‚   └── graph.py              StateGraph assembly + get_compiled_graph()
β”œβ”€β”€ prompts/
β”‚   β”œβ”€β”€ classifier.py         Query type classification + rewriting prompt
β”‚   β”œβ”€β”€ generator.py          Cited answer generation prompt
β”‚   └── validator.py          Hallucination + confidence check prompt
β”œβ”€β”€ guardrails/
β”‚   β”œβ”€β”€ input_guard.py        PII detection + off-topic filter
β”‚   └── output_guard.py       Faithfulness check + disclaimer injection
└── api/
    β”œβ”€β”€ main.py               FastAPI app factory + lifespan (graph pre-compiled)
    β”œβ”€β”€ routes/
    β”‚   β”œβ”€β”€ health.py         GET /health β€” DB ping
    β”‚   β”œβ”€β”€ query.py          POST /api/v1/query β€” main RAG endpoint
    β”‚   └── ingest.py         POST /api/v1/ingest β€” admin endpoint
    └── middleware/
        └── logging.py        Request/response structured logging

eval/
β”œβ”€β”€ golden_dataset.jsonl      31-row RAGAS evaluation dataset across 5 jurisdictions
scripts/
β”œβ”€β”€ run_eval.py               Two-phase RAGAS evaluation: Phase 1 (graph invoke) + Phase 2 (RAGAS scoring)

frontend/                     Next.js 15 App Router β€” deployed on Vercel
β”œβ”€β”€ src/app/
β”‚   β”œβ”€β”€ layout.tsx            Root layout: ThemeProvider + dark mode
β”‚   β”œβ”€β”€ page.tsx              Main page: wires all components together
β”‚   └── globals.css           Tailwind directives + gradient utilities
β”œβ”€β”€ src/components/
β”‚   β”œβ”€β”€ Header.tsx            Logo, new chat, theme toggle, GitHub link
β”‚   β”œβ”€β”€ ChatThread.tsx        Scrollable message list + empty state examples
β”‚   β”œβ”€β”€ MessageBubble.tsx     User/assistant/error bubbles with badges + citations
β”‚   β”œβ”€β”€ ConfidenceBadge.tsx   HIGH/MEDIUM/LOW pill
β”‚   β”œβ”€β”€ CitationsPanel.tsx    Collapsible citation cards
β”‚   └── InputBar.tsx          Auto-resize textarea, jurisdiction select, send
β”œβ”€β”€ src/hooks/
β”‚   └── useChat.ts            Chat state, session_id localStorage, sendMessage
└── src/lib/
    β”œβ”€β”€ types.ts              TypeScript interfaces (mirrors backend Pydantic models)
    └── api.ts                queryRera() fetch wrapper β†’ /api/v1/query

2. Database Schema

PostgreSQL Tables

documents (
    doc_id          UUID PRIMARY KEY,
    doc_name        TEXT,
    jurisdiction    TEXT,   -- Jurisdiction enum value
    doc_type        TEXT,   -- DocType enum value  (stored uppercase: ACT, RULES, CIRCULAR)
    source_url      TEXT,
    effective_date  DATE,
    gazette_number  TEXT,
    total_chunks    INTEGER,
    ingested_at     TIMESTAMPTZ,
    is_active       BOOLEAN
)

legal_chunks (
    chunk_id            UUID PRIMARY KEY,
    doc_id              UUID β†’ documents.doc_id,
    jurisdiction        TEXT,
    doc_type            TEXT,
    doc_name            TEXT,
    section_id          TEXT,   -- "18", "3(2)", "Para-3"
    section_title       TEXT,
    section_hierarchy   TEXT[], -- ["RERA Act 2016", "18"]
    text                TEXT,
    effective_date      DATE,
    superseded_by       UUID β†’ legal_chunks.chunk_id,
    status              TEXT,   -- ChunkStatus enum value
    source_url          TEXT,
    page_number         INTEGER,
    embedding           vector(768)  -- HNSW indexed
)

pgvector Index

CREATE INDEX legal_chunks_embedding_idx
    ON legal_chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

m=16 β€” 16 connections per node. ef_construction=64 β€” 64 candidates during index build. Tuned for recall/speed balance at <10K vectors. Revisit at 100K+.

Neo4j Graph Schema

Nodes:
  (:Document {doc_id, doc_name, jurisdiction, doc_type, effective_date})
  (:Section  {section_id, title, chunk_id, jurisdiction, doc_name, is_active})

Edges:
  (:Document)-[:HAS_SECTION]->(:Section)
  (:Section) -[:REFERENCES]->(:Section)       -- intra + cross-jurisdiction citations
  (:Section) -[:DERIVED_FROM]->(:Section)     -- State Rule N β†’ RERA Act Sec M
  (:Document)-[:DERIVED_FROM]->(:Document)    -- State Rules β†’ RERA Act 2016

Planned (Phase 7+):
  (:Section) -[:SUPERSEDES]->(:Section)
  (:Section) -[:AMENDED_BY]->(:Amendment)
  (:Section) -[:CONFLICTS_WITH]->(:Section)

Live graph stats (Phase 6):

Metric Count
Documents 9
Sections 2090
HAS_SECTION 1297
REFERENCES 933
DERIVED_FROM 91

3. Document Registry

document_registry.py β€” single source of truth for all ingested documents.

@dataclass(frozen=True)
class DocumentSpec:
    name: str
    url: str
    jurisdiction: Jurisdiction
    doc_type: DocType
    effective_date: date | None
    filename: str
    dest_subdir: str
    max_pages: int | None = None  # None = all pages; cap excludes forms/schedules appendices

Ingested Documents (Phase 6)

Key Document Jurisdiction DocType Chunks max_pages
rera_act_2016 RERA Act 2016 CENTRAL ACT ~224 None
mahrera_rules_2017 MahaRERA Rules 2017 MAHARASHTRA RULES ~214 None
up_rera_rules_2016 UP RERA Rules 2016 UTTAR_PRADESH RULES 170 24
up_rera_general_regulations_2019 UP RERA General Regulations 2019 UTTAR_PRADESH CIRCULAR 85 None
karnataka_rera_rules_2017 Karnataka RERA Rules 2017 KARNATAKA RULES 235 37
tn_rera_rules_2017 Tamil Nadu RERA Rules 2017 TAMIL_NADU RULES 157 15

PDF source notes:

  • Karnataka official PDF (rera.karnataka.gov.in) is fully scanned (19MB image) β€” NAREDCO mirror used
  • TN PDF bundles rules + forms (101 pages); max_pages=15 excludes Forms A–O
  • UP Rules PDF bundles rules + forms (52 pages); max_pages=24 excludes prescribed forms

4. LangGraph State Machine

State Contract (agent/state.py)

class CivicSetuState(TypedDict):
    # Input
    query: str
    session_id: Optional[str]
    jurisdiction_filter: Optional[Jurisdiction]
    top_k: int

    # Classification
    query_type: Optional[QueryType]
    rewritten_query: Optional[str]

    # Retrieval β€” Annotated[list, operator.add] enables parallel node merging
    retrieved_chunks: Annotated[list[RetrievedChunk], operator.add]
    reranked_chunks: list[RetrievedChunk]

    # Generation
    raw_response: Optional[str]
    citations: list[Citation]
    confidence_score: float
    conflict_warnings: list[str]
    amendment_notice: Optional[str]

    # Control
    retry_count: int          # max 2 retries
    hallucination_flag: bool
    error: Optional[str]

RetrievedChunk Schema (models/schemas.py)

class RetrievedChunk(BaseModel):
    chunk: LegalChunk
    vector_score: float | None = None
    rerank_score: float | None = None
    retrieval_source: str = "vector"   # "vector" | "graph"
    graph_path: Optional[str] = None   # e.g. "source:18@CENTRAL"
    is_pinned: bool = False            # True = exact source section, bypasses reranker sort

Node Responsibilities

Node Input Keys Output Keys LLM Call
classifier query query_type, rewritten_query Yes
vector_retrieval rewritten_query, top_k retrieved_chunks No
graph_retrieval rewritten_query, top_k retrieved_chunks No
reranker retrieved_chunks, query reranked_chunks No
generator reranked_chunks, query raw_response, citations, confidence_score Yes
validator raw_response, reranked_chunks hallucination_flag, confidence_score Yes
retry retry_count retry_count+1, cleared retrieval fields No

Routing Logic

classifier β†’ route_after_classifier
fact_lookup vector_retrieval (RRF hybrid)
cross_reference graph_retrieval (β†’ RRF fallback)
penalty_lookup graph_retrieval (β†’ RRF fallback)
temporal graph_retrieval (β†’ RRF fallback)
conflict_detection hybrid_retrieval (RRF across jur.)
validator β†’ route_after_validator:
    confidence >= 0.5 AND not hallucinated β†’ END
    (confidence < 0.5 OR hallucinated) AND retry_count < 2 β†’ retry β†’ classifier
    (confidence < 0.5 OR hallucinated) AND retry_count >= 2 β†’ END (low confidence answer)

5. Chunking Strategy

Section Boundary Detection

Six regex patterns across DocType.RULES, tried in order (first match wins per line):

# Pattern Format Jurisdiction
1 \n(?P<id>\d{1,2}[A-Z]?)\.\s*\n(?P<title>...) Newline-dot-newline MahaRERA
2 ^\s*(?P<id>\d{1,2}[A-Z]?)\.\s+(?P<title>...)\.?β€” Same-line em-dash MahaRERA
3 ^Rule\s+(?P<id>\d{1,2}[A-Z]?)\s*[.\-–]\s*(?P<title>...) Explicit Rule prefix Generic
4 ^\s*(?P<id>\d{1,2}[A-Z]?)\.\s+(?P<title>...?)\.– ASCII hyphen .- Karnataka, Tamil Nadu
5 (?P<id>\d{1,2}[A-Z]?)-\(1\)\s*\n(?P<title>...) N-(1)\nTitle UP RERA multi-clause
6 (?P<id>\d{1,2}[A-Z]?)-(?!\()\s*\n(?P<title>...) N-\nTitle UP RERA single-clause

DocType.ACT uses a separate pattern set. Fallback: paragraph split on double newlines. Rule IDs capped at \d{1,2} (max 2 digits) β€” prevents year strings like 2016 matching as rule IDs. Logs no_section_boundaries_found + fallback_paragraph_chunking when falling back.

Chunk Size Limits

MIN_CHARS = 100   β€” discard fragments (headers, page numbers)
MAX_CHARS = 1500  β€” split large sections at subsection markers (1), (2), (a), (b)

Split Priority for Large Sections

1. Subsection markers: \n\s*\((?:\d+|[a-z]{1,3})\)\s+
2. Sentence boundary near MAX_CHARS: rfind('. ')
3. Hard cut at MAX_CHARS (last resort)

parser.py β€” max_pages cap

@staticmethod
def parse(source: str | Path, max_pages: int | None = None) -> ParsedDocument:
    all_pages = list(doc)
    if max_pages is not None:
        all_pages = all_pages[:max_pages]   # slice before fulltext build

6. Embedding Strategy

Model: nomic-embed-text-v1.5 (via sentence-transformers, local β€” no Ollama required) Dimension: 768 Asymmetric prefixes (MTEB/nomic-embed requirement):

Ingestion time:  "search_document: {section_title}\n{text}"  β†’ pipeline.py
Query time:      "search_query: {rewritten_query}"            β†’ retrieval/__init__.py

Section title prepend (Phase 8 change): pipeline.py prepends section_title to the embedded text so sub-chunks (e.g. S.11(2)) retain their section context. Without this, sub-chunks embed without "Obligations of promoter" β€” cosine similarity misses them. The reranker still receives raw chunk.text (no title prefix).

Using wrong prefix at query time causes ~10–15% recall degradation.

Truncation Guard

MAX_EMBED_CHARS = 4000   # ~1000 tokens β€” safe ceiling before prefix added

def embed_document(self, text: str) -> list[float]:
    if len(text) > MAX_EMBED_CHARS:
        log.warning("embedding_truncated", original_len=len(text), truncated_to=MAX_EMBED_CHARS)
        text = text[:MAX_EMBED_CHARS]
    prefixed = f"search_document: {text.strip()}"  # prefix AFTER truncation
    return self.embed_one(prefixed)

Truncation happens before prefix is added β€” prevents Ollama 500 errors on Tamil Nadu and other gazette PDFs where sub-sections exceed 10K chars.


7. Hybrid Retrieval β€” _rrf_retrieve()

All retrieval nodes share a single async helper _rrf_retrieve() in agent/nodes.py.

Reciprocal Rank Fusion (RRF)

RRF_K = 60   # standard constant

rrf_score(chunk) = 1/(K + rank_in_vector) + 1/(K + rank_in_fts)

Fetches top_k Γ— 3 vector results and top_k Γ— 2 FTS results, deduplicates by chunk_id, merges via RRF, returns top top_k Γ— 2.

Full-Text Search

VectorStore.full_text_search() uses websearch_to_tsquery in OR mode:

WHERE to_tsvector('english', text) @@ websearch_to_tsquery('english', :query)
ORDER BY ts_rank(to_tsvector('english', text), websearch_to_tsquery('english', :query)) DESC

Changed from plainto_tsquery (AND-mode) β€” AND required all query words to match, excluding relevant sections that matched most but not all words.

Section Family Expansion

After RRF merge, top-3 results trigger family expansion:

for rc in merged[:3]:
    base_sid = re.sub(r'\([^)]*\)$', '', section_id).strip()  # "5(4)" β†’ "5"
    family = await VectorStore.get_section_family(section_id=base_sid, jurisdiction=jur)
    # returns all chunks where section_id = '5' OR section_id LIKE '5(%'

get_section_family guard: skips if section_id already contains ( (base_sid computation strips this before calling). Hard cap: _MAX_VECTOR_EXPANDED = 40 chunks before reranker.

Why top-3 not top-1: If top-1 RRF result is a sub-section (S.5(4)), its parent family is expanded. But if the truly relevant parent section (S.11) appears at RRF rank 2, only expanding top-1 misses it. Expanding top-3 covers more cases at the cost of a slightly larger pool.


7b. Reranker Detail

reranker_score_threshold = 0.1 β€” minimum cross-encoder score to enter candidate pool. reranker_score_gap = 0.6 β€” gap filter cliff threshold.

Gap filter:

def _apply_score_gap(chunks, gap=0.6):
    for i in range(1, len(chunks)):
        if chunks[i-1].rerank_score - chunks[i].rerank_score >= gap:
            return chunks[:i]
    return chunks

Threshold history: Originally threshold=0.3, gap=0.35. Gap=0.35 was too aggressive β€” cut chunks with 0.36 score drop, leaving only 1 context for generator. Raised to 0.6 (Phase 8).

Final context: pinned_chunks + gap_filtered[:max(0, 5 - len(pinned))] β†’ max 5 chunks.


8. Graph Retriever

graph_retriever.py β€” called on cross_reference, penalty_lookup, temporal query types.

Section ID Extraction

section_pattern = re.compile(r'\b(?:section|sec\.?|s\.)\s*(\d+[A-Z]?)\b', re.IGNORECASE)
rule_pattern    = re.compile(r'\bRule\s+(\d+[A-Z]?)\b', re.IGNORECASE)

Traversal Strategy (per jurisdiction)

For each jurisdiction (CENTRAL, MAHARASHTRA, UTTAR_PRADESH, KARNATAKA, TAMIL_NADU):

1. Source section chunks    β€” exact section_id match β†’ is_pinned=True
2. REFERENCES outgoing      β€” sections source cites (depth=2)
3. REFERENCES incoming      β€” sections that cite source
4. DERIVED_FROM outgoing    β€” Act sections this Rule derives from
5. DERIVED_FROM incoming    β€” Rule sections implementing this Act section

Pinning Rule

Only the exact section_id match gets is_pinned=True. Sub-sections are NOT pinned. Max pinned chunks: 2 (one per jurisdiction). Remaining 3 slots filled by reranker.


9. Response Contract

CivicSetuResponse:
    answer: str                    # plain English, cites section numbers
    citations: list[Citation]      # min_length=1 β€” NEVER empty
    confidence_score: float        # 0.0–1.0
    confidence_level: str          # "high"/"medium"/"low"
    query_type_resolved: QueryType
    conflict_warnings: list[str]   # empty until Phase 7
    amendment_notice: Optional[str]
    disclaimer: str                # always present

Citation:
    section_id: str
    doc_name: str
    jurisdiction: Jurisdiction
    effective_date: Optional[date]
    source_url: str
    chunk_id: UUID

9. Error Handling

Scenario Behaviour
LLM provider rate limited LiteLLM auto-rotates to next provider
All LLM providers fail RuntimeError β†’ FastAPI 500
No chunks retrieved InsufficientInfoResponse returned
Hallucination detected retry (max 2x) β†’ low confidence answer
DB unreachable /health returns degraded, query returns 500
Scanned PDF detected Warning logged, fallback URL used (Karnataka)
Section patterns not matched Fallback paragraph chunking, warning logged
Neo4j event loop mismatch Prevented β€” _get_driver() creates fresh driver per call
Embedding input too long Truncated at 4000 chars before prefix; warning logged
max_pages exceeded Parser silently caps pages; total_pages reflects capped count

10. Neo4j Graph β€” Phase 6 State (Current)

Nodes: 9 Documents, 2090 Sections Edges: 1297 HAS_SECTION, 933 REFERENCES, 91 DERIVED_FROM

Documents in Graph

Document Jurisdiction DocType Chunks Sections DERIVED_FROM edges
RERA Act 2016 CENTRAL ACT ~224 ~224 β€”
MahaRERA Rules 2017 MAHARASHTRA RULES ~214 ~214 17 sec + 1 doc
UP RERA Rules 2016 UTTAR_PRADESH RULES 170 33 11 sec + 1 doc
UP RERA General Regs 2019 UTTAR_PRADESH CIRCULAR 85 53 β€”
Karnataka RERA Rules 2017 KARNATAKA RULES 235 45 15 sec + 1 doc
Tamil Nadu RERA Rules 2017 TAMIL_NADU RULES 157 36 15 sec + 1 doc

Known Open Issues (non-blocking)

Issue Affected Root Cause
Act Β§13 missing from graph UP rule 14, KA rule 11, TN rule 11 RERA Act ingestion β€” Β§13 chunked under different ID
Act Β§66 missing from graph KA rule 19, TN rule 19 RERA Act ingestion β€” Β§66 not ingested

DERIVED_FROM Map Summary

Jurisdiction Mapped pairs Resolved Unresolved
MAHARASHTRA 17 17 0
UTTAR_PRADESH 15 11 4
KARNATAKA 17 15 2
TAMIL_NADU 17 15 2

PDF Source Decisions

Jurisdiction Primary URL Issue Resolution
CENTRAL indiacode.nic.in β€” β€”
MAHARASHTRA naredco.in β€” β€”
UTTAR_PRADESH up-rera.in/pdf/rera.pdf pages 25–52 are forms max_pages=24
KARNATAKA naredco.in (mirror) Official PDF fully scanned (19MB) NAREDCO born-digital
TAMIL_NADU cms.tn.gov.in pages 16–101 are Forms A–O max_pages=15

11. Agent Pipeline β€” Bug Fixes (2026-03-22)

Three production bugs fixed after 12-case E2E suite. All verified: 0 retries, 0 hallucinations, avg latency 7.6s.

Fix 1 β€” vector_store.py::get_section_family β€” Pydantic crash on SELECT *

SELECT * returned embedding as a raw string; Pydantic list[float] validation failed. Fix: explicit column projection, embedding=None on all returned chunks. Matches every other VectorStore method.

Fix 2 β€” nodes.py::vector_retrieval_node β€” Reranker blowup on section expansion

Section family expansion ran on all 5 similarity hits β†’ up to 121 chunks β†’ FlashRank cross-encoder serial scoring β†’ 65s reranker time. Fix (Phase 5): expand top-1 hit only; hard cap at 25 chunks before reranker.

for rc in results[:1]:
    ...family expansion...
expanded = expanded[:25]  # hard safety cap

Phase 8 update: Expanded to top-3 after RAGAS eval revealed that when a sub-section (e.g. S.5(4)) ranks #1, its parent S.5 (with the 30-day rule) was never expanded. Cap raised to 40 to accommodate larger families.

for rc in merged[:3]:    # top-3 RRF results (was: top-1)
    ...family expansion...
expanded = expanded[:40]  # was: 25

Fix 3 β€” nodes.py::validator_node β€” False hallucination flag

Validator built context as raw chunk.text joined string. Generator answer cites "Section 11(1)" but raw text has no section number β†’ validator scores 0.2 β†’ hallucinated=True β†’ spurious retry loops (7 retries across 12 tests).

Fix: mirror generator's numbered context block [i] doc β€” section_id: title\ntext. Validator can now match cited section numbers to source context.

E2E Regression Results (post-fix)

Metric Pre-fix Post-fix
Avg latency 19.6s 7.6s
Max latency 87.1s 13.3s
Avg confidence 0.908 0.958
Total retries 7 0
Slow (>20s) 3 0
Low conf (<0.7) 2 0
Pass rate 12/12 12/12

12. Agent Pipeline β€” RAGAS Eval Fixes (Phase 8, April 2026)

Five changes from RAGAS evaluation revealing retrieval and faithfulness failures.

Fix 4 β€” Reranker thresholds too aggressive (settings.py)

Old score_gap=0.35 cut after any 0.36 point drop β†’ only 1 chunk reached generator. New: score_threshold=0.1, score_gap=0.6. Keeps secondary relevant chunks while still filtering genuine noise (0.98 β†’ 0.20 drop would still cut at 0.78 gap).

Fix 5 β€” Generator analogy instruction caused hallucination (generator.py)

"Use an analogy or real-world example" produced analogies ("Think of it like selling a used car") not present in retrieved context β†’ faithfulness judge scored as hallucination. Fix: removed analogy instruction; replaced with "using only information from the provided context".

Fix 6 β€” Generator weak grounding for sparse contexts (generator.py)

Generator constructed legal conclusions from reasoning even when context lacked evidence. Added explicit rules:

  • For sparse context: say "Based on the available context: [X]" and note missing elements
  • For conflict detection: only assert conflict if BOTH provisions present in context

Fix 7 β€” CONFLICT_DETECTION tone hint implied precedence reasoning (nodes.py)

Tone hint said "state which jurisdiction takes precedence when context supports it" β€” LLM interpreted "when context supports it" loosely and applied legal reasoning. Rewritten to: "Never infer precedence from legal reasoning β€” only state precedence if the context explicitly says so."

Fix 8 β€” Temporal query rewrite too generic (classifier.py)

Query "What is the timeline for project registration?" produced rewrite "registration timeline period" β€” FTS missed Section 5 which uses "within thirty days" and "deemed registered". Added rewriting guidance to expand temporal queries with specific legal time-period keywords.

RAGAS Results (Phase 8 baseline, 5-row smoke, gemma-4-31b-it judge)

Row Faith (before) Faith (after) Prec (before) Prec (after)
CENTRAL-FACT-001 1.00 0.50 0.00 0.00
CENTRAL-FACT-002 0.80 0.62 0.00 0.33
CENTRAL-XREF-001 0.63 0.50 1.00 1.00
CENTRAL-CONF-001 0.00 0.62 0.00 0.00
CENTRAL-TEMP-001 0.67 1.00 1.00 0.00
Overall 0.618 0.650 0.400* 0.267

* Before baseline had inflated precision from duplicate chunks (non-deterministic doc_id). After Phase 8: deterministic UUID5 chunk IDs prevent duplicates on re-ingest.