Expand eval set to 89 questions; add semantic-fusion weight knob
Browse filesOvernight evaluation and precision work (retrieval behaviour unchanged):
- data/eval/questions.json: 47 -> 89 gold questions, adding case-law,
D-memorandum, collective-agreement and NJC-directive coverage so
retrieval quality is measured across every source type.
- canlex/index.py: add the W_SEM fusion-weight constant (default 1.0 =
equal weight = unchanged behaviour). Diagnosis: for several eval
misses the semantic retriever ranks the gold #1-3 but BM25 ranks it
45-82, and equal-weight RRF averages it down. precision-findings.md
has the measured sweep -- W_SEM=2.0 lifts the eval Hit@1 0.57->0.65,
Hit@5 0.88->0.90, MRR 0.70->0.75 with no regression.
- pending-cases.md: 18 leading SCC/FCA/FC cases curated for ingestion
once the Lexum 403 block clears (Phase 4 FPSLREB/CIRB is blocked by
the same intermittent block -- two ingest attempts both failed).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- canlex/index.py +2 -1
- data/eval/questions.json +43 -1
- pending-cases.md +59 -0
- precision-findings.md +79 -0
|
@@ -13,6 +13,7 @@ from .synonyms import expand_query
|
|
| 13 |
K1 = 1.5
|
| 14 |
B = 0.75
|
| 15 |
RRF_K = 60 # reciprocal-rank-fusion damping constant
|
|
|
|
| 16 |
CANDIDATES = 80 # hits each retriever contributes to the fusion
|
| 17 |
RERANK_POOL = 50 # top fused candidates the cross-encoder rescores
|
| 18 |
SOURCE_CAP = 2 # max chunks one case/memo/agreement/directive may contribute
|
|
@@ -289,7 +290,7 @@ class LegislationIndex:
|
|
| 289 |
if self.semantic:
|
| 290 |
sem_order, confidence = self._semantic_ranking(expanded)
|
| 291 |
for rank, idx in enumerate(sem_order):
|
| 292 |
-
fused[idx] +=
|
| 293 |
|
| 294 |
# Ensure explicitly-referenced sections are retrieved even if recall missed them.
|
| 295 |
refs = _section_refs(query)
|
|
|
|
| 13 |
K1 = 1.5
|
| 14 |
B = 0.75
|
| 15 |
RRF_K = 60 # reciprocal-rank-fusion damping constant
|
| 16 |
+
W_SEM = 1.0 # weight on the semantic retriever in the fusion (1.0 = equal)
|
| 17 |
CANDIDATES = 80 # hits each retriever contributes to the fusion
|
| 18 |
RERANK_POOL = 50 # top fused candidates the cross-encoder rescores
|
| 19 |
SOURCE_CAP = 2 # max chunks one case/memo/agreement/directive may contribute
|
|
|
|
| 290 |
if self.semantic:
|
| 291 |
sem_order, confidence = self._semantic_ranking(expanded)
|
| 292 |
for rank, idx in enumerate(sem_order):
|
| 293 |
+
fused[idx] += W_SEM / (RRF_K + rank)
|
| 294 |
|
| 295 |
# Ensure explicitly-referenced sections are retrieved even if recall missed them.
|
| 296 |
refs = _section_refs(query)
|
|
@@ -45,5 +45,47 @@
|
|
| 45 |
{"query": "What are the standard hours of work for an employee?", "answers": [["Canada Labour Code", "169"]]},
|
| 46 |
{"query": "What is the standard of review of an administrative decision on judicial review?", "answers": [["Vavilov", ""]]},
|
| 47 |
{"query": "How does the Refugee Appeal Division review a decision of the Refugee Protection Division?", "answers": [["Huruglica", ""]]},
|
| 48 |
-
{"query": "To get back currency seized at the border, what must the claimant show about the money?", "answers": [["Sellathurai", ""]]}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
]
|
|
|
|
| 45 |
{"query": "What are the standard hours of work for an employee?", "answers": [["Canada Labour Code", "169"]]},
|
| 46 |
{"query": "What is the standard of review of an administrative decision on judicial review?", "answers": [["Vavilov", ""]]},
|
| 47 |
{"query": "How does the Refugee Appeal Division review a decision of the Refugee Protection Division?", "answers": [["Huruglica", ""]]},
|
| 48 |
+
{"query": "To get back currency seized at the border, what must the claimant show about the money?", "answers": [["Sellathurai", ""]]},
|
| 49 |
+
{"query": "What factors shape the content of the duty of procedural fairness in an administrative decision?", "answers": [["Baker", ""]]},
|
| 50 |
+
{"query": "Can Canada remove a person to a country where they face a substantial risk of torture?", "answers": [["Suresh", ""]]},
|
| 51 |
+
{"query": "Is the security certificate regime for detaining and removing non-citizens consistent with the Charter?", "answers": [["Charkaoui", ""]]},
|
| 52 |
+
{"query": "Do refugee claimants have a right to an oral hearing under the Charter?", "answers": [["Singh", ""]]},
|
| 53 |
+
{"query": "What is a particular social group in the definition of a Convention refugee?", "answers": [["Ward", ""]]},
|
| 54 |
+
{"query": "When is a person excluded from refugee protection for acts contrary to the purposes of the United Nations?", "answers": [["Pushpanathan", ""]]},
|
| 55 |
+
{"query": "What degree of involvement makes a person complicit in international crimes and excluded from refugee protection?", "answers": [["Ezokola", ""]]},
|
| 56 |
+
{"query": "Can a person be denied refugee protection for a serious crime committed abroad before claiming asylum?", "answers": [["Febles", ""]]},
|
| 57 |
+
{"query": "How must a decision-maker weigh the best interests of a child in a humanitarian and compassionate application?", "answers": [["Kanthasamy", ""]]},
|
| 58 |
+
{"query": "What does the national interest mean when the Minister grants relief from security inadmissibility?", "answers": [["Agraira", ""]]},
|
| 59 |
+
{"query": "Does helping fellow asylum seekers enter a country illegally amount to people smuggling for inadmissibility?", "answers": [["B010", ""]]},
|
| 60 |
+
{"query": "Is the offence of human smuggling unconstitutionally overbroad if it captures humanitarian aid?", "answers": [["Appulonappa", ""]]},
|
| 61 |
+
{"query": "Does a conditional sentence count as a term of imprisonment for serious criminality?", "answers": [["Tran", ""]]},
|
| 62 |
+
{"query": "Can an immigration detainee challenge their detention through habeas corpus?", "answers": [["Chhina", ""]]},
|
| 63 |
+
{"query": "What kinds of border search engage the Charter protection against unreasonable search and seizure?", "answers": [["Simmons", ""]]},
|
| 64 |
+
{"query": "Does inadmissibility for membership in a terrorist organization require a complicity analysis?", "answers": [["Kanagendren", ""]]},
|
| 65 |
+
{"query": "How broadly is a criminal organization interpreted for organized criminality inadmissibility?", "answers": [["Sittampalam", ""]]},
|
| 66 |
+
{"query": "What principles govern a finding of inadmissibility for misrepresentation?", "answers": [["Goburdhun", ""]]},
|
| 67 |
+
{"query": "At an immigration detention review, who bears the onus and how are earlier detention rulings treated?", "answers": [["Thanabalasingham", ""]]},
|
| 68 |
+
{"query": "What is the test for admitting new evidence in a pre-removal risk assessment?", "answers": [["Raza", ""]]},
|
| 69 |
+
{"query": "Are gold coins currency or monetary instruments that must be reported when imported?", "answers": [["Hociung", ""]]},
|
| 70 |
+
{"query": "How do the courts review a customs tariff classification decision?", "answers": [["Best Buy", ""]]},
|
| 71 |
+
{"query": "What does CBSA policy say about how the value for duty of imported goods is established?", "answers": [["D-Memo", "D13-1-1"]]},
|
| 72 |
+
{"query": "What are CBSA's requirements for marking imported goods with their country of origin?", "answers": [["D-Memo", "D11-3-1"]]},
|
| 73 |
+
{"query": "What is CBSA's guidance on importing or exporting cannabis and controlled substances?", "answers": [["D-Memo", "D19-9-2"]]},
|
| 74 |
+
{"query": "What personal exemptions can a resident claim when returning to Canada, per CBSA guidance?", "answers": [["D-Memo", "D2-3-1"]]},
|
| 75 |
+
{"query": "What is CBSA's guidance on cross-border currency and monetary instruments reporting?", "answers": [["D-Memo", "D19-14-1"]]},
|
| 76 |
+
{"query": "How does CBSA decide whether imported material is obscene?", "answers": [["D-Memo", "D9-1-1"]]},
|
| 77 |
+
{"query": "What proof of origin does CBSA require for imported goods?", "answers": [["D-Memo", "D11-4-2"]]},
|
| 78 |
+
{"query": "What is the Canadian Goods Abroad Program for goods sent outside Canada for repair?", "answers": [["D-Memo", "D8-2-1"]]},
|
| 79 |
+
{"query": "How does the FB Border Services collective agreement deal with discipline of an employee?", "answers": [["FB Agreement", "17"]]},
|
| 80 |
+
{"query": "What is the grievance procedure under the FB collective agreement?", "answers": [["FB Agreement", "18"]]},
|
| 81 |
+
{"query": "What are the hours of work under the FB Border Services collective agreement?", "answers": [["FB Agreement", "25"]]},
|
| 82 |
+
{"query": "How is overtime compensated under the FB collective agreement?", "answers": [["FB Agreement", "28"]]},
|
| 83 |
+
{"query": "How much vacation leave with pay do FB-group employees earn?", "answers": [["FB Agreement", "34"]]},
|
| 84 |
+
{"query": "What is the bilingualism bonus and who is eligible to receive it?", "answers": [["Bilingualism Bonus Directive", ""]]},
|
| 85 |
+
{"query": "What assistance is available to a federal employee who faces unusual daily commuting costs?", "answers": [["Commuting Assistance Directive", ""]]},
|
| 86 |
+
{"query": "When may the Immigration Division order the release of a detained person?", "answers": [["IRPA", "58"]]},
|
| 87 |
+
{"query": "When is there no right to appeal a removal order to the Immigration Appeal Division?", "answers": [["IRPA", "64"]]},
|
| 88 |
+
{"query": "Is distributing cannabis an offence?", "answers": [["Cannabis Act", "9"]]},
|
| 89 |
+
{"query": "What is the offence of smuggling goods into Canada under the Customs Act?", "answers": [["Customs Act", "159"]]},
|
| 90 |
+
{"query": "Is an employee entitled to medical leave under the Canada Labour Code?", "answers": [["Canada Labour Code", "239"]]}
|
| 91 |
]
|
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Leading cases to add when the Lexum block clears
|
| 2 |
+
|
| 3 |
+
**Status (2026-05-21):** Phase 4 (FPSLREB/CIRB) and this case-law expansion are
|
| 4 |
+
both blocked. `canlex.caselaw` fetches return **HTTP 403** from the Lexum
|
| 5 |
+
decision hosts β two full ingest attempts, every FPSLREB/CIRB decision failed.
|
| 6 |
+
A single-shot probe of the same URLs returns 200, so the block is intermittent
|
| 7 |
+
or specific to the fetch pattern (likely the `?iframe=true` decision endpoint).
|
| 8 |
+
|
| 9 |
+
**To ingest when the block clears:** for each case below, open the decision on
|
| 10 |
+
its court's `decisions.*.gc.ca` site, take the numeric **item id** from the
|
| 11 |
+
`.../item/{id}/index.do` URL, and add an entry to `CASES` in
|
| 12 |
+
`canlex/caselaw.py` (`{"court": ..., "id": ..., "short": ..., "topic": ...}`),
|
| 13 |
+
then re-run `py -m canlex.caselaw` β `py -m canlex.embed` β redeploy. Verify the
|
| 14 |
+
citation against the page (these are from memory). Cases already in the corpus
|
| 15 |
+
are excluded.
|
| 16 |
+
|
| 17 |
+
## Supreme Court of Canada (court: "scc")
|
| 18 |
+
- **Chiarelli** β Canada (MEI) v Chiarelli, [1992] 1 SCR 711 β s. 7 fundamental
|
| 19 |
+
justice and the deportation of permanent residents; the constitutional
|
| 20 |
+
baseline for non-citizens.
|
| 21 |
+
- **Chieu** β Chieu v Canada (MCI), 2002 SCC 3 β Immigration Appeal Division
|
| 22 |
+
removal-order appeals; foreign hardship is a relevant consideration.
|
| 23 |
+
- **Medovarski** β Medovarski v Canada (MCI), 2005 SCC 51 β IRPA's objectives;
|
| 24 |
+
no Charter s. 7 right for a non-citizen to remain in Canada.
|
| 25 |
+
- **Mugesera** β Mugesera v Canada (MCI), 2005 SCC 40 β inadmissibility for
|
| 26 |
+
crimes against humanity; incitement to genocide.
|
| 27 |
+
- **Harkat** β Canada (Citizenship and Immigration) v Harkat, 2014 SCC 37 β
|
| 28 |
+
constitutionality of the security-certificate regime and special advocates.
|
| 29 |
+
- **NΓ©meth** β NΓ©meth v Canada (Justice), 2010 SCC 56 β extradition and the
|
| 30 |
+
refugee principle of non-refoulement.
|
| 31 |
+
- **Mavi** β Canada (AG) v Mavi, 2011 SCC 30 β sponsorship-undertaking debt and
|
| 32 |
+
the duty of procedural fairness in enforcing it.
|
| 33 |
+
- **Pham** β R v Pham, 2013 SCC 15 β collateral immigration consequences as a
|
| 34 |
+
factor in criminal sentencing.
|
| 35 |
+
- **Chan** β Chan v Canada (MEI), [1995] 3 SCR 593 β refugee protection; a
|
| 36 |
+
particular social group and well-founded fear (verify the citation).
|
| 37 |
+
- **Jacques** β R v Jacques, [1996] 3 SCR 312 β border searches; a vehicle stop
|
| 38 |
+
near the border and customs officers' powers.
|
| 39 |
+
- **Martineau** β Martineau v MNR, 2004 SCC 81 β whether an ascertained-
|
| 40 |
+
forfeiture notice under the Customs Act is a penal proceeding.
|
| 41 |
+
|
| 42 |
+
## Federal Court of Appeal (court: "fca")
|
| 43 |
+
- **Poshteh** β Poshteh v Canada (MCI), 2005 FCA 85 β membership in a terrorist
|
| 44 |
+
organization for inadmissibility; relevance of the claimant's age.
|
| 45 |
+
- **Hinzman** β Hinzman v Canada (MCI), 2007 FCA 171 β refugee and H&C claims by
|
| 46 |
+
US military deserters.
|
| 47 |
+
- **Thamotharem** β Canada (MCI) v Thamotharem, 2007 FCA 198 β Refugee
|
| 48 |
+
Protection Division hearing procedure; order of questioning; fettering by
|
| 49 |
+
guidelines.
|
| 50 |
+
- **Toussaint** β Toussaint v Canada (AG), 2011 FCA 213 β interim federal health
|
| 51 |
+
coverage and access for indigent applicants.
|
| 52 |
+
- **Rahaman** β Rahaman v Canada (MCI), 2002 FCA 89 β refugee claims and the "no
|
| 53 |
+
credible basis" finding.
|
| 54 |
+
|
| 55 |
+
## Federal Court (court: "fc")
|
| 56 |
+
- **Almrei (Re)** β Almrei (Re), 2009 FC 1263 β reasonableness of a security
|
| 57 |
+
certificate after Charkaoui.
|
| 58 |
+
- **Sahin** β Sahin v Canada (MCI), [1995] 1 FC 214 β the foundational factors
|
| 59 |
+
for the length of immigration detention.
|
|
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CanLex retrieval β precision investigation (2026-05-21)
|
| 2 |
+
|
| 3 |
+
Investigation of the persistent eval misses, with a tested, recommended fix.
|
| 4 |
+
**No retrieval-algorithm change has been deployed** β this is for review.
|
| 5 |
+
|
| 6 |
+
## The question
|
| 7 |
+
|
| 8 |
+
The eval had a handful of persistent misses where the correct provision ranked
|
| 9 |
+
outside the top 5. Why, and what fixes it?
|
| 10 |
+
|
| 11 |
+
## Diagnosis
|
| 12 |
+
|
| 13 |
+
Stage-by-stage trace of each miss β the gold provision's rank out of each
|
| 14 |
+
retriever, and after fusion:
|
| 15 |
+
|
| 16 |
+
| Query | Gold | BM25 rank | Semantic rank | Fused rank |
|
| 17 |
+
|---|---|---|---|---|
|
| 18 |
+
| pre-removal risk assessment | IRPA s.112 | 45 | 35 | 35 |
|
| 19 |
+
| report to a customs officer on arrival | Customs Act s.11 | 51 | **1** | 6 |
|
| 20 |
+
| duty to report imported goods | Customs Act s.12 | 58 | **1** | 6 |
|
| 21 |
+
| report large amounts of currency | PCMLTFA s.12 | 82 | 32 | 63 |
|
| 22 |
+
| seize unreported currency | PCMLTFA s.18 | 51 | **3** | 14 |
|
| 23 |
+
|
| 24 |
+
Two distinct causes:
|
| 25 |
+
|
| 26 |
+
**1. BM25 dilutes strong semantic hits.** For Customs Act s.11 and s.12 and
|
| 27 |
+
PCMLTFA s.18 the *semantic* retriever ranks the gold #1, #1, #3 β essentially
|
| 28 |
+
perfect. But BM25 ranks the same provision #51, #58, #51, because the query
|
| 29 |
+
keywords ("report", "currency", "arriving") are common words with no
|
| 30 |
+
distinctive term to latch onto. Reciprocal-rank fusion with equal weight
|
| 31 |
+
averages the two rankings, so a #1 semantic hit fused with a #51 BM25 hit lands
|
| 32 |
+
around #6. The strong signal is diluted by the weak one.
|
| 33 |
+
|
| 34 |
+
**2. The enacting statute is out-competed by elaborating material.** IRPA s.112
|
| 35 |
+
(PRRA) is ranked only mediocre by *both* retrievers (BM25 #45, semantic #35):
|
| 36 |
+
the IRPR regulations (s.160 "Application for protection", s.161, s.165, s.232)
|
| 37 |
+
elaborate the PRRA process across many focused sections, and the
|
| 38 |
+
currency-forfeiture case law (Dokaj, Williams, Hociung) crowds PCMLTFA s.12. One
|
| 39 |
+
enacting section cannot out-rank a dozen elaborating chunks on a topical query.
|
| 40 |
+
The `_ensure_legislation` guarantee added this batch mitigates this at the
|
| 41 |
+
production default `top_k=6` (PCMLTFA s.18 reaches #2 there, vs #11 at the
|
| 42 |
+
eval's `top_k=20`), but does not fix cause #2 fully.
|
| 43 |
+
|
| 44 |
+
## Tested fix β up-weight the semantic retriever
|
| 45 |
+
|
| 46 |
+
`canlex/index.py` now has a `W_SEM` constant: the weight on the semantic
|
| 47 |
+
retriever's contribution to the RRF fusion (default **1.0** = equal weight =
|
| 48 |
+
current, unchanged behaviour). Sweep on the 89-question eval set:
|
| 49 |
+
|
| 50 |
+
| W_SEM | Hit@1 | Hit@3 | Hit@5 | Hit@10 | MRR |
|
| 51 |
+
|---|---|---|---|---|---|
|
| 52 |
+
| 1.0 (current) | 0.573 | 0.787 | 0.876 | 0.921 | 0.701 |
|
| 53 |
+
| 1.5 | 0.629 | 0.798 | 0.888 | 0.933 | 0.737 |
|
| 54 |
+
| 2.0 | 0.652 | 0.809 | 0.899 | 0.933 | 0.752 |
|
| 55 |
+
| 3.0 | 0.652 | 0.820 | 0.910 | 0.933 | 0.754 |
|
| 56 |
+
|
| 57 |
+
Up-weighting the semantic retriever improves every metric monotonically, with no
|
| 58 |
+
regression β the gain is largest exactly where the diagnosis predicted
|
| 59 |
+
(Hit@1 +0.08, MRR +0.05).
|
| 60 |
+
|
| 61 |
+
## Recommendation
|
| 62 |
+
|
| 63 |
+
**Set `W_SEM = 2.0`** in `canlex/index.py`. It captures most of the gain
|
| 64 |
+
(Hit@1 0.57 -> 0.65, Hit@5 0.88 -> 0.90, MRR 0.70 -> 0.75) while keeping a
|
| 65 |
+
meaningful BM25 contribution. W_SEM=3.0 squeezes slightly more but tilts the
|
| 66 |
+
fusion heavily toward semantic; 2.0 is the balanced choice.
|
| 67 |
+
|
| 68 |
+
To apply: change the one constant, run `py -m canlex.eval` to confirm, redeploy.
|
| 69 |
+
|
| 70 |
+
Caveat: measured on the 89-question eval. Semantic up-weighting is principled
|
| 71 |
+
(the diagnostic shows semantic genuinely ranks these golds well), but keep an
|
| 72 |
+
eye on exact-keyword and section-number lookups after adopting it.
|
| 73 |
+
|
| 74 |
+
## Still hard after W_SEM=2.0
|
| 75 |
+
|
| 76 |
+
IRPA s.112 (PRRA) β cause #2 above; W_SEM does not fix it, because semantic
|
| 77 |
+
itself ranks s.112 only #35. A later option: an Act-over-its-own-regulation
|
| 78 |
+
tie-break, or accepting that the IRPR PRRA regulations are themselves a
|
| 79 |
+
reasonable answer and broadening that gold.
|