Spaces:

Beemer0
/

CanLex

Running

Beemer Claude Opus 4.7 commited on 5 days ago

Commit

d72272a

1 Parent(s): 1e58371

Expand eval set to 89 questions; add semantic-fusion weight knob

Overnight evaluation and precision work (retrieval behaviour unchanged):

- data/eval/questions.json: 47 -> 89 gold questions, adding case-law,
D-memorandum, collective-agreement and NJC-directive coverage so
retrieval quality is measured across every source type.
- canlex/index.py: add the W_SEM fusion-weight constant (default 1.0 =
equal weight = unchanged behaviour). Diagnosis: for several eval
misses the semantic retriever ranks the gold #1-3 but BM25 ranks it
45-82, and equal-weight RRF averages it down. precision-findings.md
has the measured sweep -- W_SEM=2.0 lifts the eval Hit@1 0.57->0.65,
Hit@5 0.88->0.90, MRR 0.70->0.75 with no regression.
- pending-cases.md: 18 leading SCC/FCA/FC cases curated for ingestion
once the Lexum 403 block clears (Phase 4 FPSLREB/CIRB is blocked by
the same intermittent block -- two ingest attempts both failed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (4) hide show

canlex/index.py +2 -1
data/eval/questions.json +43 -1
pending-cases.md +59 -0
precision-findings.md +79 -0

canlex/index.py CHANGED Viewed

@@ -13,6 +13,7 @@ from .synonyms import expand_query
 K1 = 1.5
 B = 0.75
 RRF_K = 60          # reciprocal-rank-fusion damping constant
 CANDIDATES = 80     # hits each retriever contributes to the fusion
 RERANK_POOL = 50    # top fused candidates the cross-encoder rescores
 SOURCE_CAP = 2      # max chunks one case/memo/agreement/directive may contribute
@@ -289,7 +290,7 @@ class LegislationIndex:
         if self.semantic:
             sem_order, confidence = self._semantic_ranking(expanded)
             for rank, idx in enumerate(sem_order):
-                fused[idx] += 1.0 / (RRF_K + rank)
         # Ensure explicitly-referenced sections are retrieved even if recall missed them.
         refs = _section_refs(query)

 K1 = 1.5
 B = 0.75
 RRF_K = 60          # reciprocal-rank-fusion damping constant
+W_SEM = 1.0         # weight on the semantic retriever in the fusion (1.0 = equal)
 CANDIDATES = 80     # hits each retriever contributes to the fusion
 RERANK_POOL = 50    # top fused candidates the cross-encoder rescores
 SOURCE_CAP = 2      # max chunks one case/memo/agreement/directive may contribute
         if self.semantic:
             sem_order, confidence = self._semantic_ranking(expanded)
             for rank, idx in enumerate(sem_order):
+                fused[idx] += W_SEM / (RRF_K + rank)
         # Ensure explicitly-referenced sections are retrieved even if recall missed them.
         refs = _section_refs(query)

data/eval/questions.json CHANGED Viewed

@@ -45,5 +45,47 @@
   {"query": "What are the standard hours of work for an employee?", "answers": [["Canada Labour Code", "169"]]},
   {"query": "What is the standard of review of an administrative decision on judicial review?", "answers": [["Vavilov", ""]]},
   {"query": "How does the Refugee Appeal Division review a decision of the Refugee Protection Division?", "answers": [["Huruglica", ""]]},
-  {"query": "To get back currency seized at the border, what must the claimant show about the money?", "answers": [["Sellathurai", ""]]}
 ]

   {"query": "What are the standard hours of work for an employee?", "answers": [["Canada Labour Code", "169"]]},
   {"query": "What is the standard of review of an administrative decision on judicial review?", "answers": [["Vavilov", ""]]},
   {"query": "How does the Refugee Appeal Division review a decision of the Refugee Protection Division?", "answers": [["Huruglica", ""]]},
+  {"query": "To get back currency seized at the border, what must the claimant show about the money?", "answers": [["Sellathurai", ""]]},
+  {"query": "What factors shape the content of the duty of procedural fairness in an administrative decision?", "answers": [["Baker", ""]]},
+  {"query": "Can Canada remove a person to a country where they face a substantial risk of torture?", "answers": [["Suresh", ""]]},
+  {"query": "Is the security certificate regime for detaining and removing non-citizens consistent with the Charter?", "answers": [["Charkaoui", ""]]},
+  {"query": "Do refugee claimants have a right to an oral hearing under the Charter?", "answers": [["Singh", ""]]},
+  {"query": "What is a particular social group in the definition of a Convention refugee?", "answers": [["Ward", ""]]},
+  {"query": "When is a person excluded from refugee protection for acts contrary to the purposes of the United Nations?", "answers": [["Pushpanathan", ""]]},
+  {"query": "What degree of involvement makes a person complicit in international crimes and excluded from refugee protection?", "answers": [["Ezokola", ""]]},
+  {"query": "Can a person be denied refugee protection for a serious crime committed abroad before claiming asylum?", "answers": [["Febles", ""]]},
+  {"query": "How must a decision-maker weigh the best interests of a child in a humanitarian and compassionate application?", "answers": [["Kanthasamy", ""]]},
+  {"query": "What does the national interest mean when the Minister grants relief from security inadmissibility?", "answers": [["Agraira", ""]]},
+  {"query": "Does helping fellow asylum seekers enter a country illegally amount to people smuggling for inadmissibility?", "answers": [["B010", ""]]},
+  {"query": "Is the offence of human smuggling unconstitutionally overbroad if it captures humanitarian aid?", "answers": [["Appulonappa", ""]]},
+  {"query": "Does a conditional sentence count as a term of imprisonment for serious criminality?", "answers": [["Tran", ""]]},
+  {"query": "Can an immigration detainee challenge their detention through habeas corpus?", "answers": [["Chhina", ""]]},
+  {"query": "What kinds of border search engage the Charter protection against unreasonable search and seizure?", "answers": [["Simmons", ""]]},
+  {"query": "Does inadmissibility for membership in a terrorist organization require a complicity analysis?", "answers": [["Kanagendren", ""]]},
+  {"query": "How broadly is a criminal organization interpreted for organized criminality inadmissibility?", "answers": [["Sittampalam", ""]]},
+  {"query": "What principles govern a finding of inadmissibility for misrepresentation?", "answers": [["Goburdhun", ""]]},
+  {"query": "At an immigration detention review, who bears the onus and how are earlier detention rulings treated?", "answers": [["Thanabalasingham", ""]]},
+  {"query": "What is the test for admitting new evidence in a pre-removal risk assessment?", "answers": [["Raza", ""]]},
+  {"query": "Are gold coins currency or monetary instruments that must be reported when imported?", "answers": [["Hociung", ""]]},
+  {"query": "How do the courts review a customs tariff classification decision?", "answers": [["Best Buy", ""]]},
+  {"query": "What does CBSA policy say about how the value for duty of imported goods is established?", "answers": [["D-Memo", "D13-1-1"]]},
+  {"query": "What are CBSA's requirements for marking imported goods with their country of origin?", "answers": [["D-Memo", "D11-3-1"]]},
+  {"query": "What is CBSA's guidance on importing or exporting cannabis and controlled substances?", "answers": [["D-Memo", "D19-9-2"]]},
+  {"query": "What personal exemptions can a resident claim when returning to Canada, per CBSA guidance?", "answers": [["D-Memo", "D2-3-1"]]},
+  {"query": "What is CBSA's guidance on cross-border currency and monetary instruments reporting?", "answers": [["D-Memo", "D19-14-1"]]},
+  {"query": "How does CBSA decide whether imported material is obscene?", "answers": [["D-Memo", "D9-1-1"]]},
+  {"query": "What proof of origin does CBSA require for imported goods?", "answers": [["D-Memo", "D11-4-2"]]},
+  {"query": "What is the Canadian Goods Abroad Program for goods sent outside Canada for repair?", "answers": [["D-Memo", "D8-2-1"]]},
+  {"query": "How does the FB Border Services collective agreement deal with discipline of an employee?", "answers": [["FB Agreement", "17"]]},
+  {"query": "What is the grievance procedure under the FB collective agreement?", "answers": [["FB Agreement", "18"]]},
+  {"query": "What are the hours of work under the FB Border Services collective agreement?", "answers": [["FB Agreement", "25"]]},
+  {"query": "How is overtime compensated under the FB collective agreement?", "answers": [["FB Agreement", "28"]]},
+  {"query": "How much vacation leave with pay do FB-group employees earn?", "answers": [["FB Agreement", "34"]]},
+  {"query": "What is the bilingualism bonus and who is eligible to receive it?", "answers": [["Bilingualism Bonus Directive", ""]]},
+  {"query": "What assistance is available to a federal employee who faces unusual daily commuting costs?", "answers": [["Commuting Assistance Directive", ""]]},
+  {"query": "When may the Immigration Division order the release of a detained person?", "answers": [["IRPA", "58"]]},
+  {"query": "When is there no right to appeal a removal order to the Immigration Appeal Division?", "answers": [["IRPA", "64"]]},
+  {"query": "Is distributing cannabis an offence?", "answers": [["Cannabis Act", "9"]]},
+  {"query": "What is the offence of smuggling goods into Canada under the Customs Act?", "answers": [["Customs Act", "159"]]},
+  {"query": "Is an employee entitled to medical leave under the Canada Labour Code?", "answers": [["Canada Labour Code", "239"]]}
 ]

pending-cases.md ADDED Viewed

	@@ -0,0 +1,59 @@

+# Leading cases to add when the Lexum block clears
+**Status (2026-05-21):** Phase 4 (FPSLREB/CIRB) and this case-law expansion are
+both blocked. `canlex.caselaw` fetches return **HTTP 403** from the Lexum
+decision hosts — two full ingest attempts, every FPSLREB/CIRB decision failed.
+A single-shot probe of the same URLs returns 200, so the block is intermittent
+or specific to the fetch pattern (likely the `?iframe=true` decision endpoint).
+**To ingest when the block clears:** for each case below, open the decision on
+its court's `decisions.*.gc.ca` site, take the numeric **item id** from the
+`.../item/{id}/index.do` URL, and add an entry to `CASES` in
+`canlex/caselaw.py` (`{"court": ..., "id": ..., "short": ..., "topic": ...}`),
+then re-run `py -m canlex.caselaw` → `py -m canlex.embed` → redeploy. Verify the
+citation against the page (these are from memory). Cases already in the corpus
+are excluded.
+## Supreme Court of Canada  (court: "scc")
+- **Chiarelli** — Canada (MEI) v Chiarelli, [1992] 1 SCR 711 — s. 7 fundamental
+  justice and the deportation of permanent residents; the constitutional
+  baseline for non-citizens.
+- **Chieu** — Chieu v Canada (MCI), 2002 SCC 3 — Immigration Appeal Division
+  removal-order appeals; foreign hardship is a relevant consideration.
+- **Medovarski** — Medovarski v Canada (MCI), 2005 SCC 51 — IRPA's objectives;
+  no Charter s. 7 right for a non-citizen to remain in Canada.
+- **Mugesera** — Mugesera v Canada (MCI), 2005 SCC 40 — inadmissibility for
+  crimes against humanity; incitement to genocide.
+- **Harkat** — Canada (Citizenship and Immigration) v Harkat, 2014 SCC 37 —
+  constitutionality of the security-certificate regime and special advocates.
+- **Németh** — Németh v Canada (Justice), 2010 SCC 56 — extradition and the
+  refugee principle of non-refoulement.
+- **Mavi** — Canada (AG) v Mavi, 2011 SCC 30 — sponsorship-undertaking debt and
+  the duty of procedural fairness in enforcing it.
+- **Pham** — R v Pham, 2013 SCC 15 — collateral immigration consequences as a
+  factor in criminal sentencing.
+- **Chan** — Chan v Canada (MEI), [1995] 3 SCR 593 — refugee protection; a
+  particular social group and well-founded fear (verify the citation).
+- **Jacques** — R v Jacques, [1996] 3 SCR 312 — border searches; a vehicle stop
+  near the border and customs officers' powers.
+- **Martineau** — Martineau v MNR, 2004 SCC 81 — whether an ascertained-
+  forfeiture notice under the Customs Act is a penal proceeding.
+## Federal Court of Appeal  (court: "fca")
+- **Poshteh** — Poshteh v Canada (MCI), 2005 FCA 85 — membership in a terrorist
+  organization for inadmissibility; relevance of the claimant's age.
+- **Hinzman** — Hinzman v Canada (MCI), 2007 FCA 171 — refugee and H&C claims by
+  US military deserters.
+- **Thamotharem** — Canada (MCI) v Thamotharem, 2007 FCA 198 — Refugee
+  Protection Division hearing procedure; order of questioning; fettering by
+  guidelines.
+- **Toussaint** — Toussaint v Canada (AG), 2011 FCA 213 — interim federal health
+  coverage and access for indigent applicants.
+- **Rahaman** — Rahaman v Canada (MCI), 2002 FCA 89 — refugee claims and the "no
+  credible basis" finding.
+## Federal Court  (court: "fc")
+- **Almrei (Re)** — Almrei (Re), 2009 FC 1263 — reasonableness of a security
+  certificate after Charkaoui.
+- **Sahin** — Sahin v Canada (MCI), [1995] 1 FC 214 — the foundational factors
+  for the length of immigration detention.

precision-findings.md ADDED Viewed

	@@ -0,0 +1,79 @@

+# CanLex retrieval — precision investigation (2026-05-21)
+Investigation of the persistent eval misses, with a tested, recommended fix.
+**No retrieval-algorithm change has been deployed** — this is for review.
+## The question
+The eval had a handful of persistent misses where the correct provision ranked
+outside the top 5. Why, and what fixes it?
+## Diagnosis
+Stage-by-stage trace of each miss — the gold provision's rank out of each
+retriever, and after fusion:
+| Query | Gold | BM25 rank | Semantic rank | Fused rank |
+|---|---|---|---|---|
+| pre-removal risk assessment | IRPA s.112 | 45 | 35 | 35 |
+| report to a customs officer on arrival | Customs Act s.11 | 51 | **1** | 6 |
+| duty to report imported goods | Customs Act s.12 | 58 | **1** | 6 |
+| report large amounts of currency | PCMLTFA s.12 | 82 | 32 | 63 |
+| seize unreported currency | PCMLTFA s.18 | 51 | **3** | 14 |
+Two distinct causes:
+**1. BM25 dilutes strong semantic hits.** For Customs Act s.11 and s.12 and
+PCMLTFA s.18 the *semantic* retriever ranks the gold #1, #1, #3 — essentially
+perfect. But BM25 ranks the same provision #51, #58, #51, because the query
+keywords ("report", "currency", "arriving") are common words with no
+distinctive term to latch onto. Reciprocal-rank fusion with equal weight
+averages the two rankings, so a #1 semantic hit fused with a #51 BM25 hit lands
+around #6. The strong signal is diluted by the weak one.
+**2. The enacting statute is out-competed by elaborating material.** IRPA s.112
+(PRRA) is ranked only mediocre by *both* retrievers (BM25 #45, semantic #35):
+the IRPR regulations (s.160 "Application for protection", s.161, s.165, s.232)
+elaborate the PRRA process across many focused sections, and the
+currency-forfeiture case law (Dokaj, Williams, Hociung) crowds PCMLTFA s.12. One
+enacting section cannot out-rank a dozen elaborating chunks on a topical query.
+The `_ensure_legislation` guarantee added this batch mitigates this at the
+production default `top_k=6` (PCMLTFA s.18 reaches #2 there, vs #11 at the
+eval's `top_k=20`), but does not fix cause #2 fully.
+## Tested fix — up-weight the semantic retriever
+`canlex/index.py` now has a `W_SEM` constant: the weight on the semantic
+retriever's contribution to the RRF fusion (default **1.0** = equal weight =
+current, unchanged behaviour). Sweep on the 89-question eval set:
+| W_SEM | Hit@1 | Hit@3 | Hit@5 | Hit@10 | MRR |
+|---|---|---|---|---|---|
+| 1.0 (current) | 0.573 | 0.787 | 0.876 | 0.921 | 0.701 |
+| 1.5 | 0.629 | 0.798 | 0.888 | 0.933 | 0.737 |
+| 2.0 | 0.652 | 0.809 | 0.899 | 0.933 | 0.752 |
+| 3.0 | 0.652 | 0.820 | 0.910 | 0.933 | 0.754 |
+Up-weighting the semantic retriever improves every metric monotonically, with no
+regression — the gain is largest exactly where the diagnosis predicted
+(Hit@1 +0.08, MRR +0.05).
+## Recommendation
+**Set `W_SEM = 2.0`** in `canlex/index.py`. It captures most of the gain
+(Hit@1 0.57 -> 0.65, Hit@5 0.88 -> 0.90, MRR 0.70 -> 0.75) while keeping a
+meaningful BM25 contribution. W_SEM=3.0 squeezes slightly more but tilts the
+fusion heavily toward semantic; 2.0 is the balanced choice.
+To apply: change the one constant, run `py -m canlex.eval` to confirm, redeploy.
+Caveat: measured on the 89-question eval. Semantic up-weighting is principled
+(the diagnostic shows semantic genuinely ranks these golds well), but keep an
+eye on exact-keyword and section-number lookups after adopting it.
+## Still hard after W_SEM=2.0
+IRPA s.112 (PRRA) — cause #2 above; W_SEM does not fix it, because semantic
+itself ranks s.112 only #35. A later option: an Act-over-its-own-regulation
+tie-break, or accepting that the IRPR PRRA regulations are themselves a
+reasonable answer and broadening that gold.