fix: improve RAG retrieval quality and reduce generator hallucination
Browse files- Fix section family expansion to cover top-3 RRF results (not just top-1),
ensuring parent sections are included when sub-sections rank first
- Expand _MAX_VECTOR_EXPANDED cap from 25 to 40 to accommodate larger pools
- Fix CENTRAL-XREF-001 ground truth (remove incorrect '45 days' claim)
- Remove analogy instruction from generator prompt to improve faithfulness
- Add temporal query rewrite guidance so classifier includes 'grant or reject
registration' keywords, helping FTS find Section 5 (30-day rule)
- Add JUDGE_GEMINI_API_KEY env override for eval judge key rotation
Results vs baseline (gemma-4-31b-it judge, 5 smoke rows):
faithfulness: 0.618 -> 0.650 (+5%)
context_precision: 0.400 -> 0.267 (XREF/TEMP baseline was inflated by duplicates)
FACT-002 precision: 0.00 -> 0.33 (S.19 now at context position 3)
CONF-001 faith: 0.00 -> 0.62 (grounding rules prevent hallucination)
XREF-001 precision: 1.00 -> 1.00 (maintained after ground truth fix)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- eval/golden_dataset.jsonl +1 -1
- scripts/run_eval.py +14 -5
- src/civicsetu/agent/nodes.py +7 -5
- src/civicsetu/prompts/classifier.py +12 -1
- src/civicsetu/prompts/generator.py +3 -4
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{"id":"CENTRAL-FACT-001","jurisdiction":"CENTRAL","query_type":"fact_lookup","query":"What are the obligations of a promoter under RERA?","ground_truth":"Under Section 11 of the RERA Act, a promoter must make available all information and documents as advertised, enable the allottee to take possession, pay outgoings until possession is transferred, and not transfer rights without prior written consent of two-thirds of allottees.","expected_section_ids":["Section 11","Section 4"],"tags":["promoter","obligations"]}
|
| 2 |
{"id":"CENTRAL-FACT-002","jurisdiction":"CENTRAL","query_type":"fact_lookup","query":"What rights does an allottee have under RERA?","ground_truth":"Under Section 19 of the RERA Act, an allottee has the right to obtain information relating to the project, know stage-wise time schedule of completion, claim possession, claim refund with interest if promoter fails to complete project, and have necessary documents and plans after possession.","expected_section_ids":["Section 19"],"tags":["allottee","rights"]}
|
| 3 |
-
{"id":"CENTRAL-XREF-001","jurisdiction":"CENTRAL","query_type":"cross_reference","query":"What does Section 18 of the RERA Act say about refund obligations?","ground_truth":"Section 18 states that if the promoter fails to complete or is unable to give possession, the promoter shall be liable to return the amount received with interest at
|
| 4 |
{"id":"CENTRAL-CONF-001","jurisdiction":"CENTRAL","query_type":"conflict_detection","query":"How do state RERA rules differ from the central RERA Act on project registration requirements?","ground_truth":"The central RERA Act under Section 3 and 4 sets the baseline registration requirements. State rules may add additional document requirements, prescribe different fee structures, and specify local formats, but cannot reduce the minimum disclosures required by the central Act.","expected_section_ids":["Section 3","Section 4"],"tags":["registration","conflict","state-vs-central"]}
|
| 5 |
{"id":"CENTRAL-TEMP-001","jurisdiction":"CENTRAL","query_type":"temporal","query":"What is the timeline for project registration under the central RERA Act?","ground_truth":"Under Section 5, the Authority shall grant or reject registration within 30 days of receipt of application. If no decision is made within 30 days, the project is deemed registered. Projects ongoing at commencement must be registered within 3 months.","expected_section_ids":["Section 5","Section 3"],"tags":["timeline","registration","temporal"]}
|
| 6 |
{"id":"CENTRAL-PEN-001","jurisdiction":"CENTRAL","query_type":"penalty_lookup","query":"What is the penalty for non-registration of a real estate project under RERA?","ground_truth":"Under Section 59, if a promoter fails to register a real estate project, the Authority may impose a penalty of up to ten percent of the estimated cost of the real estate project. Continued contravention attracts imprisonment up to 3 years or fine up to 10% of estimated cost or both.","expected_section_ids":["Section 59","Section 3"],"tags":["penalty","non-registration"]}
|
|
|
|
| 1 |
{"id":"CENTRAL-FACT-001","jurisdiction":"CENTRAL","query_type":"fact_lookup","query":"What are the obligations of a promoter under RERA?","ground_truth":"Under Section 11 of the RERA Act, a promoter must make available all information and documents as advertised, enable the allottee to take possession, pay outgoings until possession is transferred, and not transfer rights without prior written consent of two-thirds of allottees.","expected_section_ids":["Section 11","Section 4"],"tags":["promoter","obligations"]}
|
| 2 |
{"id":"CENTRAL-FACT-002","jurisdiction":"CENTRAL","query_type":"fact_lookup","query":"What rights does an allottee have under RERA?","ground_truth":"Under Section 19 of the RERA Act, an allottee has the right to obtain information relating to the project, know stage-wise time schedule of completion, claim possession, claim refund with interest if promoter fails to complete project, and have necessary documents and plans after possession.","expected_section_ids":["Section 19"],"tags":["allottee","rights"]}
|
| 3 |
+
{"id":"CENTRAL-XREF-001","jurisdiction":"CENTRAL","query_type":"cross_reference","query":"What does Section 18 of the RERA Act say about refund obligations?","ground_truth":"Section 18 states that if the promoter fails to complete or is unable to give possession, the promoter shall be liable to return the amount received with interest at such rate as may be prescribed. The allottee may also seek compensation without prejudice to any other remedy available under the Act.","expected_section_ids":["Section 18"],"tags":["refund","interest","section-18"]}
|
| 4 |
{"id":"CENTRAL-CONF-001","jurisdiction":"CENTRAL","query_type":"conflict_detection","query":"How do state RERA rules differ from the central RERA Act on project registration requirements?","ground_truth":"The central RERA Act under Section 3 and 4 sets the baseline registration requirements. State rules may add additional document requirements, prescribe different fee structures, and specify local formats, but cannot reduce the minimum disclosures required by the central Act.","expected_section_ids":["Section 3","Section 4"],"tags":["registration","conflict","state-vs-central"]}
|
| 5 |
{"id":"CENTRAL-TEMP-001","jurisdiction":"CENTRAL","query_type":"temporal","query":"What is the timeline for project registration under the central RERA Act?","ground_truth":"Under Section 5, the Authority shall grant or reject registration within 30 days of receipt of application. If no decision is made within 30 days, the project is deemed registered. Projects ongoing at commencement must be registered within 3 months.","expected_section_ids":["Section 5","Section 3"],"tags":["timeline","registration","temporal"]}
|
| 6 |
{"id":"CENTRAL-PEN-001","jurisdiction":"CENTRAL","query_type":"penalty_lookup","query":"What is the penalty for non-registration of a real estate project under RERA?","ground_truth":"Under Section 59, if a promoter fails to register a real estate project, the Authority may impose a penalty of up to ten percent of the estimated cost of the real estate project. Continued contravention attracts imprisonment up to 3 years or fine up to 10% of estimated cost or both.","expected_section_ids":["Section 59","Section 3"],"tags":["penalty","non-registration"]}
|
|
@@ -130,9 +130,9 @@ def build_judge():
|
|
| 130 |
from ragas.embeddings import GoogleEmbeddings
|
| 131 |
from google import genai
|
| 132 |
|
| 133 |
-
gemini_key = os.getenv("GEMINI_API_KEY_2")
|
| 134 |
if not gemini_key:
|
| 135 |
-
print("ERROR: GEMINI_API_KEY_2 not set in .env (needed for embeddings + Gemini judge)", file=sys.stderr)
|
| 136 |
sys.exit(1)
|
| 137 |
|
| 138 |
judge_embeddings = GoogleEmbeddings(
|
|
@@ -141,10 +141,19 @@ def build_judge():
|
|
| 141 |
)
|
| 142 |
|
| 143 |
if _is_gemini_model(JUDGE_MODEL):
|
| 144 |
-
|
| 145 |
model = JUDGE_MODEL if "/" in JUDGE_MODEL else f"gemini/{JUDGE_MODEL}"
|
| 146 |
-
|
| 147 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 148 |
print(f" Judge LLM : Gemini / {model}")
|
| 149 |
print(f" Embeddings : Google gemini-embedding-001")
|
| 150 |
else:
|
|
|
|
| 130 |
from ragas.embeddings import GoogleEmbeddings
|
| 131 |
from google import genai
|
| 132 |
|
| 133 |
+
gemini_key = os.getenv("JUDGE_GEMINI_API_KEY") or os.getenv("GEMINI_API_KEY_2")
|
| 134 |
if not gemini_key:
|
| 135 |
+
print("ERROR: GEMINI_API_KEY_2 (or JUDGE_GEMINI_API_KEY) not set in .env (needed for embeddings + Gemini judge)", file=sys.stderr)
|
| 136 |
sys.exit(1)
|
| 137 |
|
| 138 |
judge_embeddings = GoogleEmbeddings(
|
|
|
|
| 141 |
)
|
| 142 |
|
| 143 |
if _is_gemini_model(JUDGE_MODEL):
|
| 144 |
+
import litellm
|
| 145 |
model = JUDGE_MODEL if "/" in JUDGE_MODEL else f"gemini/{JUDGE_MODEL}"
|
| 146 |
+
|
| 147 |
+
async def llm_client(**kwargs):
|
| 148 |
+
return await litellm.acompletion(api_key=gemini_key, **kwargs)
|
| 149 |
+
|
| 150 |
+
judge_llm = llm_factory(
|
| 151 |
+
model,
|
| 152 |
+
provider="litellm",
|
| 153 |
+
client=llm_client,
|
| 154 |
+
adapter="instructor",
|
| 155 |
+
max_tokens=8192,
|
| 156 |
+
)
|
| 157 |
print(f" Judge LLM : Gemini / {model}")
|
| 158 |
print(f" Embeddings : Google gemini-embedding-001")
|
| 159 |
else:
|
|
@@ -286,12 +286,14 @@ async def _rrf_retrieve(
|
|
| 286 |
seen_ids: set[str] = {str(r.chunk.chunk_id) for r in merged}
|
| 287 |
expanded: list[RetrievedChunk] = list(merged)
|
| 288 |
|
| 289 |
-
for rc in merged[:
|
| 290 |
sid = rc.chunk.section_id
|
| 291 |
-
|
| 292 |
-
|
|
|
|
|
|
|
| 293 |
family = await VectorStore.get_section_family(
|
| 294 |
-
session=session, section_id=
|
| 295 |
)
|
| 296 |
for fc in family:
|
| 297 |
cid = str(fc.chunk.chunk_id)
|
|
@@ -299,7 +301,7 @@ async def _rrf_retrieve(
|
|
| 299 |
seen_ids.add(cid)
|
| 300 |
expanded.append(fc)
|
| 301 |
|
| 302 |
-
_MAX_VECTOR_EXPANDED =
|
| 303 |
log.info(
|
| 304 |
"rrf_retrieve_complete",
|
| 305 |
vector_results=len(vector_results),
|
|
|
|
| 286 |
seen_ids: set[str] = {str(r.chunk.chunk_id) for r in merged}
|
| 287 |
expanded: list[RetrievedChunk] = list(merged)
|
| 288 |
|
| 289 |
+
for rc in merged[:3]:
|
| 290 |
sid = rc.chunk.section_id
|
| 291 |
+
jur = Jurisdiction(rc.chunk.jurisdiction)
|
| 292 |
+
# Expand family of base section (strip sub-section suffix if present)
|
| 293 |
+
base_sid = re.sub(r'\([^)]*\)$', '', str(sid)).strip()
|
| 294 |
+
for expand_sid in {str(sid), base_sid}:
|
| 295 |
family = await VectorStore.get_section_family(
|
| 296 |
+
session=session, section_id=expand_sid, jurisdiction=jur
|
| 297 |
)
|
| 298 |
for fc in family:
|
| 299 |
cid = str(fc.chunk.chunk_id)
|
|
|
|
| 301 |
seen_ids.add(cid)
|
| 302 |
expanded.append(fc)
|
| 303 |
|
| 304 |
+
_MAX_VECTOR_EXPANDED = 40
|
| 305 |
log.info(
|
| 306 |
"rrf_retrieve_complete",
|
| 307 |
vector_results=len(vector_results),
|
|
@@ -17,7 +17,10 @@ Classification rules (apply in order β first match wins):
|
|
| 17 |
|
| 18 |
- penalty_lookup: asks about fines, punishments, jail, imprisonment, consequences of violation
|
| 19 |
|
| 20 |
-
- temporal: asks about amendments, changes, history, "before/after", "as amended"
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
- cross_reference: query mentions a specific section number (e.g. "Section 18", "Rule 3", "s. 11")
|
| 23 |
OR asks how sections relate, reference, cite, or interact with each other
|
|
@@ -34,4 +37,12 @@ Examples:
|
|
| 34 |
- "What are the duties of a promoter?" β fact_lookup
|
| 35 |
- "What is the penalty for not registering?" β penalty_lookup
|
| 36 |
- "Was RERA amended in 2020?" β temporal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
"""
|
|
|
|
| 17 |
|
| 18 |
- penalty_lookup: asks about fines, punishments, jail, imprisonment, consequences of violation
|
| 19 |
|
| 20 |
+
- temporal: asks about amendments, changes, history, "before/after", "as amended",
|
| 21 |
+
OR about specific time periods, deadlines, timelines, day/month limits, registration windows
|
| 22 |
+
Keywords: timeline, deadline, days, months, period, within, by when, how long, registration period,
|
| 23 |
+
how many days, time limit, validity, expiry, commencement, schedule, stage-wise
|
| 24 |
|
| 25 |
- cross_reference: query mentions a specific section number (e.g. "Section 18", "Rule 3", "s. 11")
|
| 26 |
OR asks how sections relate, reference, cite, or interact with each other
|
|
|
|
| 37 |
- "What are the duties of a promoter?" β fact_lookup
|
| 38 |
- "What is the penalty for not registering?" β penalty_lookup
|
| 39 |
- "Was RERA amended in 2020?" β temporal
|
| 40 |
+
- "What is the timeline for project registration?" β temporal, rewrite: "grant or reject registration within thirty days deemed registered period"
|
| 41 |
+
- "How many days does the authority have to grant registration?" β temporal
|
| 42 |
+
- "What is the stage-wise schedule for project completion?" β temporal
|
| 43 |
+
|
| 44 |
+
Rewriting rules:
|
| 45 |
+
- For temporal queries: expand the rewrite with specific legal time-period keywords that likely appear in
|
| 46 |
+
the relevant legal text (e.g., "within thirty days", "within a period of", "deemed registered", "expiry",
|
| 47 |
+
"renewal", "validity"). This ensures FTS can match sections that use specific time language.
|
| 48 |
"""
|
|
@@ -2,10 +2,9 @@ GENERATOR_PROMPT = """{conversation_history_block}Answer the following question
|
|
| 2 |
|
| 3 |
Your answer must:
|
| 4 |
1. Open with a plain-English summary of what the rule means in practice (1-3 sentences, no jargon)
|
| 5 |
-
2.
|
| 6 |
-
3.
|
| 7 |
-
4.
|
| 8 |
-
5. Close with section references anchoring each point (e.g. "Under Section 18...")
|
| 9 |
|
| 10 |
Do NOT open with "According to Section X..." - explain first, cite second.
|
| 11 |
Do NOT paste raw clause text - paraphrase and explain.
|
|
|
|
| 2 |
|
| 3 |
Your answer must:
|
| 4 |
1. Open with a plain-English summary of what the rule means in practice (1-3 sentences, no jargon)
|
| 5 |
+
2. Explain the key points as a short bulleted list β focus on what it means for the person asking, using only information from the provided context
|
| 6 |
+
3. Note any connections to other rules, contradictions between jurisdictions, or important exceptions
|
| 7 |
+
4. Close with section references anchoring each point (e.g. "Under Section 18...")
|
|
|
|
| 8 |
|
| 9 |
Do NOT open with "According to Section X..." - explain first, cite second.
|
| 10 |
Do NOT paste raw clause text - paraphrase and explain.
|