Beemer commited on
Commit
5527c63
·
1 Parent(s): 21626e7

Add Phase 1 case law: 20 leading Supreme Court of Canada decisions

Browse files
Files changed (3) hide show
  1. canlex/caselaw.py +304 -0
  2. canlex/server.py +46 -16
  3. data/processed/caselaw.json +0 -0
canlex/caselaw.py ADDED
@@ -0,0 +1,304 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Ingest leading Supreme Court of Canada decisions as section-style chunks.
2
+
3
+ Source: the SCC's official decisions database (decisions.scc-csc.ca, the Lexum
4
+ "Norma" platform). A decision's text sits inside an iframe, so each item is
5
+ fetched by appending ?iframe=true to its URL. This ingests a *curated* set of
6
+ leading cases -- it is deliberately not a comprehensive scrape.
7
+
8
+ py -m canlex.caselaw
9
+ """
10
+ import json
11
+ import re
12
+ import time
13
+ import urllib.request
14
+
15
+ from bs4 import BeautifulSoup
16
+
17
+ from .config import PROCESSED_DIR, RAW_DIR
18
+
19
+ ITEM_URL = "https://decisions.scc-csc.ca/scc-csc/scc-csc/en/item/{id}/index.do"
20
+ _RAW = RAW_DIR / "scc"
21
+ OUT = PROCESSED_DIR / "caselaw.json"
22
+
23
+ # A normal browser User-Agent: the SCC site denylists a few crawler UAs, while
24
+ # its robots.txt otherwise permits the decision pages. Politeness comes from the
25
+ # throttle below and from caching every fetched page on disk.
26
+ _UA = ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
27
+ "(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")
28
+ _THROTTLE = 2.0 # seconds between live fetches
29
+ _CHUNK_CHARS = 1800 # target characters per chunk
30
+
31
+ # Marks the post-reasons apparatus (appended legislation, solicitors list),
32
+ # which is not part of the judgment's reasons.
33
+ _APPARATUS = re.compile(r"^\s*(APPENDIX\b|Solicitors?\s+for\b)", re.I)
34
+
35
+ # Curated leading SCC cases on border / immigration / customs / Charter law.
36
+ # 'id' is the verified decisions.scc-csc.ca item ID; 'short' and 'topic' are
37
+ # curated. The case name, citation and date are parsed from the page itself.
38
+ SCC_CASES = [
39
+ {"id": 18078, "short": "Vavilov",
40
+ "topic": "Standard of review on judicial review; the reasonableness "
41
+ "standard for administrative decisions"},
42
+ {"id": 20081, "short": "Mason",
43
+ "topic": "Inadmissibility under IRPA s. 34(1)(e) for acts of violence "
44
+ "endangering safety in Canada; reasonableness review"},
45
+ {"id": 16803, "short": "Tran",
46
+ "topic": "Serious criminality and inadmissibility under IRPA s. 36; the "
47
+ "meaning of a term of imprisonment and an offence punishable by"},
48
+ {"id": 15647, "short": "B010",
49
+ "topic": "Inadmissibility for people smuggling under IRPA s. 37(1)(b); "
50
+ "organized criminality"},
51
+ {"id": 15648, "short": "Appulonappa",
52
+ "topic": "The human smuggling offence in IRPA s. 117; constitutional "
53
+ "overbreadth and humanitarian aid to asylum seekers"},
54
+ {"id": 14419, "short": "Febles",
55
+ "topic": "Exclusion from refugee protection for a serious non-political "
56
+ "crime under Article 1F(b) of the Refugee Convention"},
57
+ {"id": 13184, "short": "Ezokola",
58
+ "topic": "Complicity and exclusion from refugee protection for "
59
+ "international crimes under Article 1F(a)"},
60
+ {"id": 15665, "short": "Kanthasamy",
61
+ "topic": "Humanitarian and compassionate relief under IRPA s. 25; the "
62
+ "best interests of a child"},
63
+ {"id": 13137, "short": "Agraira",
64
+ "topic": "Ministerial relief from inadmissibility on security grounds; "
65
+ "the national interest under IRPA"},
66
+ {"id": 6901, "short": "Khosa",
67
+ "topic": "Standard of review of immigration decisions; judicial review "
68
+ "of a removal order"},
69
+ {"id": 2345, "short": "Charkaoui",
70
+ "topic": "Security certificates and immigration detention; the Charter "
71
+ "and procedural fairness"},
72
+ {"id": 1937, "short": "Suresh",
73
+ "topic": "Deportation to a risk of torture; Charter s. 7 and removal on "
74
+ "security grounds"},
75
+ {"id": 17759, "short": "Chhina",
76
+ "topic": "Habeas corpus as a remedy for immigration detention; review of "
77
+ "lengthy detention"},
78
+ {"id": 1717, "short": "Baker",
79
+ "topic": "Procedural fairness in administrative decisions; humanitarian "
80
+ "and compassionate review; the duty to give reasons"},
81
+ {"id": 39, "short": "Singh",
82
+ "topic": "Charter s. 7 rights of refugee claimants; the right to an oral "
83
+ "hearing"},
84
+ {"id": 377, "short": "Simmons",
85
+ "topic": "Customs searches at the border; Charter s. 8 and the reasonable "
86
+ "expectation of privacy on entry to Canada"},
87
+ {"id": 1694, "short": "Monney",
88
+ "topic": "Border detention for a customs search; reasonable suspicion "
89
+ "and the Customs Act"},
90
+ {"id": 986, "short": "Dehghani",
91
+ "topic": "Charter rights at a port of entry; secondary examination and "
92
+ "the right to counsel"},
93
+ {"id": 1627, "short": "Pushpanathan",
94
+ "topic": "Exclusion from refugee protection under Article 1F(c) for acts "
95
+ "contrary to the purposes of the United Nations"},
96
+ {"id": 1023, "short": "Ward",
97
+ "topic": "The refugee definition; a particular social group; the "
98
+ "availability of state protection"},
99
+ ]
100
+
101
+
102
+ def _fetch(item_id):
103
+ """Return a decision's iframe HTML, caching the raw page under data/raw."""
104
+ cache = _RAW / f"{item_id}.html"
105
+ if cache.exists():
106
+ return cache.read_text(encoding="utf-8")
107
+ url = ITEM_URL.format(id=item_id) + "?iframe=true"
108
+ req = urllib.request.Request(url, headers={"User-Agent": _UA})
109
+ time.sleep(_THROTTLE)
110
+ with urllib.request.urlopen(req, timeout=60) as resp:
111
+ text = resp.read().decode("utf-8", errors="replace")
112
+ _RAW.mkdir(parents=True, exist_ok=True)
113
+ cache.write_text(text, encoding="utf-8")
114
+ return text
115
+
116
+
117
+ def _norm(text):
118
+ """Collapse all whitespace, including non-breaking spaces."""
119
+ return re.sub(r"\s+", " ", text.replace("\xa0", " ")).strip()
120
+
121
+
122
+ def _metadata(soup):
123
+ """Return (case_name, {label: value}) from the decision's metadata block."""
124
+ box = soup.find("div", class_="metadata")
125
+ if not box:
126
+ return "", {}
127
+ title = box.find("h3", class_="title")
128
+ name = _norm(title.get_text()) if title else ""
129
+ fields = {}
130
+ for row in box.find_all("tr"):
131
+ label = row.find("td", class_="label")
132
+ value = row.find("td", class_="metadata")
133
+ if label and value:
134
+ fields[_norm(label.get_text()).lower()] = _norm(value.get_text())
135
+ return name, fields
136
+
137
+
138
+ def _body(soup):
139
+ """Locate the container holding the judgment text."""
140
+ return (soup.find(id="document-content")
141
+ or soup.find("div", class_="documentcontent")
142
+ or soup.find("div", class_="WordSection1")
143
+ or soup.body or soup)
144
+
145
+
146
+ def _paragraphs(soup):
147
+ """Return (is_numbered, [(label, text), ...]) for the judgment body.
148
+
149
+ Modern SCC judgments open each paragraph with a bracketed number "[N]".
150
+ They are detected by content -- a run of sequentially numbered <p> blocks --
151
+ so the parser does not depend on Word style names, which vary by era. Every
152
+ <p> between one numbered opener and the next belongs to that paragraph.
153
+ Older, unnumbered decisions fall back to taking every <p> in document order.
154
+ """
155
+ blocks = [p for p in _body(soup).find_all("p")
156
+ if "MsoFootnoteText" not in (p.get("class") or [])]
157
+ texts = [p.get_text() for p in blocks]
158
+
159
+ # Drop the post-reasons apparatus (appended legislation, solicitors list);
160
+ # it is not part of the reasons and would otherwise swell the last paragraph.
161
+ for i, raw in enumerate(texts):
162
+ if _APPARATUS.match(raw):
163
+ texts = texts[:i]
164
+ break
165
+
166
+ # A paragraph opens with its number: "[12]" (most decisions) or a bare "12"
167
+ # followed by wide tab spacing (pre-2009 decisions). The brackets are
168
+ # self-identifying; a bare number must have 2+ trailing spaces, which
169
+ # rejects quoted enumerations ("2. ..."). The sequential check rejects
170
+ # stray bracketed years like "[1998]".
171
+ openers = {} # block index -> paragraph number
172
+ expected = 1
173
+ for i, raw in enumerate(texts):
174
+ match = (re.match(r"\s*\[\s*(\d+)\s*\]", raw)
175
+ or re.match(r"\s*(\d+)\s{2,}\S", raw))
176
+ if match:
177
+ n = int(match.group(1))
178
+ if expected <= n <= expected + 2: # sequential, small-gap tolerant
179
+ openers[i] = n
180
+ expected = n + 1
181
+
182
+ if len(openers) < 5: # unnumbered: take every <p>
183
+ paras = [(str(j), _norm(t)) for j, t in enumerate(texts, start=1)]
184
+ return False, [(n, t) for n, t in paras if len(t) > 1]
185
+
186
+ paras, num, buf = [], None, []
187
+ for i, raw in enumerate(texts):
188
+ if i in openers:
189
+ if num is not None:
190
+ paras.append((str(num), _norm(" ".join(buf))))
191
+ num = openers[i]
192
+ buf = [re.sub(r"^\s*\[?\s*\d+\s*\]?\s*", "", raw)]
193
+ elif num is not None:
194
+ buf.append(raw)
195
+ if num is not None:
196
+ paras.append((str(num), _norm(" ".join(buf))))
197
+ return True, [(n, t) for n, t in paras if t]
198
+
199
+
200
+ def _split_text(text, limit):
201
+ """Split text longer than `limit` into pieces, breaking on a sentence or
202
+ word boundary so no single chunk blows the retrieval/reranker budget."""
203
+ if len(text) <= limit:
204
+ return [text]
205
+ pieces, start = [], 0
206
+ while start < len(text):
207
+ if len(text) - start <= limit:
208
+ pieces.append(text[start:])
209
+ break
210
+ window = text[start:start + limit]
211
+ cut = window.rfind(". ")
212
+ cut = cut + 1 if cut > limit // 2 else window.rfind(" ")
213
+ if cut <= 0:
214
+ cut = limit
215
+ pieces.append(text[start:start + cut])
216
+ start += cut
217
+ return [p.strip() for p in pieces if p.strip()]
218
+
219
+
220
+ def _chunk(paras):
221
+ """Group consecutive paragraphs into ~_CHUNK_CHARS-sized chunks, first
222
+ splitting any single paragraph that on its own exceeds the budget."""
223
+ units = []
224
+ for label, text in paras:
225
+ for piece in _split_text(text, _CHUNK_CHARS):
226
+ units.append((label, piece))
227
+ chunks, buf, size = [], [], 0
228
+ for label, text in units:
229
+ if buf and size + len(text) > _CHUNK_CHARS:
230
+ chunks.append(buf)
231
+ buf, size = [], 0
232
+ buf.append((label, text))
233
+ size += len(text)
234
+ if buf:
235
+ chunks.append(buf)
236
+ return chunks
237
+
238
+
239
+ def _decision_chunks(case, soup):
240
+ """Build CanLex chunk dicts for one decision."""
241
+ name, fields = _metadata(soup)
242
+ name = name or case["short"]
243
+ cite = fields.get("neutral citation") or fields.get("report") or ""
244
+ report = fields.get("report", "")
245
+ date = fields.get("date", "")
246
+ citation = f"{name}, {cite}" if cite else name
247
+ item_url = ITEM_URL.format(id=case["id"])
248
+ modern, paras = _paragraphs(soup)
249
+ chunks = []
250
+ for i, group in enumerate(_chunk(paras), start=1):
251
+ if modern:
252
+ first, last = group[0][0], group[-1][0]
253
+ locator = (f"para {first}" if first == last
254
+ else f"paras {first}–{last}")
255
+ else:
256
+ locator = f"excerpt {i}"
257
+ chunks.append({
258
+ "id": f"scc-{case['id']}-{i}",
259
+ "doc_type": "caselaw",
260
+ "act_code": cite or f"SCC item {case['id']}",
261
+ "act_short": case["short"],
262
+ "act_name": name,
263
+ "section": "",
264
+ "citation": citation,
265
+ "marginal_note": locator,
266
+ "heading": case["topic"],
267
+ "part": "Supreme Court of Canada",
268
+ "division": "",
269
+ "text": "\n\n".join(t for _, t in group),
270
+ "current_to": date,
271
+ "last_amended": "",
272
+ "history": report if report and report != cite else "",
273
+ "source_url": item_url,
274
+ })
275
+ return chunks, citation, len(paras)
276
+
277
+
278
+ def build():
279
+ """Fetch, parse and chunk every curated SCC decision into caselaw.json."""
280
+ all_chunks = []
281
+ for case in SCC_CASES:
282
+ try:
283
+ soup = BeautifulSoup(_fetch(case["id"]), "html.parser")
284
+ except Exception as exc:
285
+ print(f" !! {case['short']}: fetch failed -- "
286
+ f"{type(exc).__name__}: {exc}")
287
+ continue
288
+ chunks, citation, n_paras = _decision_chunks(case, soup)
289
+ if not chunks:
290
+ print(f" !! {case['short']} (item {case['id']}): "
291
+ f"0 chunks -- check parsing")
292
+ continue
293
+ all_chunks.extend(chunks)
294
+ print(f" {case['short']:13s} {n_paras:4d} paras -> "
295
+ f"{len(chunks):3d} chunks {citation}")
296
+ PROCESSED_DIR.mkdir(parents=True, exist_ok=True)
297
+ OUT.write_text(json.dumps(all_chunks, ensure_ascii=False, indent=1),
298
+ encoding="utf-8")
299
+ print(f"\n{len(all_chunks)} case-law chunks from "
300
+ f"{len(SCC_CASES)} SCC decisions -> {OUT}")
301
+
302
+
303
+ if __name__ == "__main__":
304
+ build()
canlex/server.py CHANGED
@@ -35,7 +35,10 @@ GROUNDING_NOTE = (
35
  "CBSA D-Memoranda are administrative guidance -- persuasive only, not binding, "
36
  "and a court may disagree with them; collective agreements and the National "
37
  "Joint Council directives they incorporate are binding employment-terms "
38
- "instruments for a bargaining unit. State the "
 
 
 
39
  "'current to', modified, or in-force date when stating the law. If the material "
40
  "below does not fully resolve the question -- including where it turns on case "
41
  "law or facts not present here -- say so explicitly. This is legal information, "
@@ -72,6 +75,13 @@ def _format_section(c: dict) -> str:
72
  lines.append("_National Joint Council directive — forms part of collective "
73
  "agreements; binding for the matters it covers._")
74
  lines.append(f"(effective {c['current_to'] or 'n/a'})")
 
 
 
 
 
 
 
75
  else:
76
  meta = [f"current to {c['current_to'] or 'n/a'}"]
77
  if c["last_amended"]:
@@ -81,7 +91,12 @@ def _format_section(c: dict) -> str:
81
  lines.append(c["text"])
82
  lines.append("")
83
  if c["history"]:
84
- lines.append(f"Amendment history: {c['history']}")
 
 
 
 
 
85
  lines.append(f"Source: {c['source_url']}")
86
  return "\n".join(lines)
87
 
@@ -110,7 +125,8 @@ class SearchInput(BaseModel):
110
  default=None,
111
  description="Optional filter by source type: 'legislation' (Acts and "
112
  "regulations), 'memorandum' (CBSA D-Memoranda), 'agreement' (collective "
113
- "agreements), or 'directive' (NJC directives). Omit to search all.",
 
114
  )
115
 
116
 
@@ -127,25 +143,28 @@ class GetSectionInput(BaseModel):
127
  @mcp.tool(name="canlex_search_legislation",
128
  annotations={"title": "Search Canadian Legislation", **_READONLY})
129
  def canlex_search_legislation(params: SearchInput) -> str:
130
- """Search Canadian federal law, CBSA D-Memoranda, agreements, and NJC directives.
 
131
 
132
- The CanLex corpus has four kinds of source: 31 federal Acts and regulations
133
  (immigration, customs, criminal, drugs, food/health, labour, privacy and more);
134
  CBSA D-Memoranda (the Canada Border Services Agency's administrative guidance on
135
  how it applies customs and border law); Treasury Board collective agreements
136
- (currently the FB / Border Services group); and National Joint Council directives
137
- (travel, relocation, isolated posts and more). Use this for ANY question about
138
- that material. It ranks results by relevance and returns their full text so the
139
- answer can cite the actual wording; an explicit section reference (e.g. "section
140
- 34") is always surfaced. Each result is marked with its source type.
 
 
141
 
142
  Args:
143
  params (SearchInput): Validated input containing:
144
  - query (str): Legal question or keywords to search for.
145
  - top_k (int): How many sections to return, 1-20 (default 6).
146
  - act (Optional[str]): Restrict to one Act by short name/code, or omit for all.
147
- - doc_type (Optional[str]): 'legislation', 'memorandum', 'agreement', or
148
- 'directive' to restrict to one source type; omit to search all.
149
 
150
  Returns:
151
  str: Markdown with answering instructions followed by the matching sections.
@@ -213,7 +232,7 @@ def canlex_get_section(params: GetSectionInput) -> str:
213
  annotations={"title": "List Loaded Legislation", **_READONLY})
214
  def canlex_list_acts() -> str:
215
  """List what the CanLex corpus contains -- Acts and regulations, CBSA
216
- D-Memoranda, collective agreements, and NJC directives.
217
 
218
  Use this to learn the scope and currency of the corpus before searching, or to
219
  report it to the user.
@@ -225,6 +244,7 @@ def canlex_list_acts() -> str:
225
  acts: dict[str, dict] = {}
226
  agreements: dict[str, dict] = {}
227
  directives: dict[str, dict] = {}
 
228
  memo_numbers: set[str] = set()
229
  memo_chunks = 0
230
  memo_date = ""
@@ -245,6 +265,11 @@ def canlex_list_acts() -> str:
245
  "short": c["act_short"], "current_to": c["current_to"], "count": 0,
246
  })
247
  entry["count"] += 1
 
 
 
 
 
248
  else:
249
  entry = acts.setdefault(c["act_code"], {
250
  "short": c["act_short"], "name": c["act_name"],
@@ -271,10 +296,15 @@ def canlex_list_acts() -> str:
271
  for a in sorted(directives.values(), key=lambda x: x["short"]):
272
  lines.append(f"- **{a['short']}**: {a['count']} sections, "
273
  f"effective {a['current_to'] or 'n/a'}")
 
 
 
 
 
274
  lines += ["", "Search with canlex_search_legislation; filter by doc_type "
275
- "(legislation / memorandum / agreement / directive). Fetch a known "
276
- "provision with canlex_get_section, or a case's citations with "
277
- "canlex_case."]
278
  return "\n".join(lines)
279
 
280
 
 
35
  "CBSA D-Memoranda are administrative guidance -- persuasive only, not binding, "
36
  "and a court may disagree with them; collective agreements and the National "
37
  "Joint Council directives they incorporate are binding employment-terms "
38
+ "instruments for a bargaining unit; court decisions interpret and apply the "
39
+ "law and are binding precedent depending on the court and jurisdiction -- "
40
+ "name the deciding court and the date, and do not assume a decision is still "
41
+ "good law if it may have been overtaken. State the "
42
  "'current to', modified, or in-force date when stating the law. If the material "
43
  "below does not fully resolve the question -- including where it turns on case "
44
  "law or facts not present here -- say so explicitly. This is legal information, "
 
75
  lines.append("_National Joint Council directive — forms part of collective "
76
  "agreements; binding for the matters it covers._")
77
  lines.append(f"(effective {c['current_to'] or 'n/a'})")
78
+ elif doc_type == "caselaw":
79
+ lines.append("_Court decision — binding precedent depending on the court "
80
+ "and jurisdiction; confirm it has not been overturned on "
81
+ "appeal or overtaken by later authority._")
82
+ lines.append(f"(decided {c['current_to'] or 'n/a'})")
83
+ if c["heading"]:
84
+ lines.append(f"Subject: {c['heading']}")
85
  else:
86
  meta = [f"current to {c['current_to'] or 'n/a'}"]
87
  if c["last_amended"]:
 
91
  lines.append(c["text"])
92
  lines.append("")
93
  if c["history"]:
94
+ if doc_type == "caselaw":
95
+ lines.append(f"Also reported: {c['history']}")
96
+ elif doc_type == "legislation":
97
+ lines.append(f"Amendment history: {c['history']}")
98
+ else:
99
+ lines.append(f"History: {c['history']}")
100
  lines.append(f"Source: {c['source_url']}")
101
  return "\n".join(lines)
102
 
 
125
  default=None,
126
  description="Optional filter by source type: 'legislation' (Acts and "
127
  "regulations), 'memorandum' (CBSA D-Memoranda), 'agreement' (collective "
128
+ "agreements), 'directive' (NJC directives), or 'caselaw' (Supreme Court "
129
+ "of Canada decisions). Omit to search all.",
130
  )
131
 
132
 
 
143
  @mcp.tool(name="canlex_search_legislation",
144
  annotations={"title": "Search Canadian Legislation", **_READONLY})
145
  def canlex_search_legislation(params: SearchInput) -> str:
146
+ """Search Canadian federal law, CBSA D-Memoranda, agreements, NJC directives,
147
+ and leading Supreme Court of Canada cases.
148
 
149
+ The CanLex corpus has five kinds of source: 31 federal Acts and regulations
150
  (immigration, customs, criminal, drugs, food/health, labour, privacy and more);
151
  CBSA D-Memoranda (the Canada Border Services Agency's administrative guidance on
152
  how it applies customs and border law); Treasury Board collective agreements
153
+ (currently the FB / Border Services group); National Joint Council directives
154
+ (travel, relocation, isolated posts and more); and leading Supreme Court of
155
+ Canada decisions on immigration, customs and Charter-at-the-border law. Use this
156
+ for ANY question about that material. It ranks results by relevance and returns
157
+ their full text so the answer can cite the actual wording; an explicit section
158
+ reference (e.g. "section 34") is always surfaced. Each result is marked with its
159
+ source type.
160
 
161
  Args:
162
  params (SearchInput): Validated input containing:
163
  - query (str): Legal question or keywords to search for.
164
  - top_k (int): How many sections to return, 1-20 (default 6).
165
  - act (Optional[str]): Restrict to one Act by short name/code, or omit for all.
166
+ - doc_type (Optional[str]): 'legislation', 'memorandum', 'agreement',
167
+ 'directive', or 'caselaw' to restrict to one source type; omit for all.
168
 
169
  Returns:
170
  str: Markdown with answering instructions followed by the matching sections.
 
232
  annotations={"title": "List Loaded Legislation", **_READONLY})
233
  def canlex_list_acts() -> str:
234
  """List what the CanLex corpus contains -- Acts and regulations, CBSA
235
+ D-Memoranda, collective agreements, NJC directives, and leading cases.
236
 
237
  Use this to learn the scope and currency of the corpus before searching, or to
238
  report it to the user.
 
244
  acts: dict[str, dict] = {}
245
  agreements: dict[str, dict] = {}
246
  directives: dict[str, dict] = {}
247
+ cases: dict[str, dict] = {}
248
  memo_numbers: set[str] = set()
249
  memo_chunks = 0
250
  memo_date = ""
 
265
  "short": c["act_short"], "current_to": c["current_to"], "count": 0,
266
  })
267
  entry["count"] += 1
268
+ elif doc_type == "caselaw":
269
+ entry = cases.setdefault(c["act_code"], {
270
+ "name": c["act_name"], "decided": c["current_to"], "count": 0,
271
+ })
272
+ entry["count"] += 1
273
  else:
274
  entry = acts.setdefault(c["act_code"], {
275
  "short": c["act_short"], "name": c["act_name"],
 
296
  for a in sorted(directives.values(), key=lambda x: x["short"]):
297
  lines.append(f"- **{a['short']}**: {a['count']} sections, "
298
  f"effective {a['current_to'] or 'n/a'}")
299
+ if cases:
300
+ lines += ["", "## Case law (Supreme Court of Canada)"]
301
+ for cite, a in sorted(cases.items(), key=lambda kv: kv[1]["decided"]):
302
+ lines.append(f"- **{a['name']}**, {cite}: {a['count']} excerpts, "
303
+ f"decided {a['decided'] or 'n/a'}")
304
  lines += ["", "Search with canlex_search_legislation; filter by doc_type "
305
+ "(legislation / memorandum / agreement / directive / caselaw). Fetch "
306
+ "a known provision with canlex_get_section, or a case's citations "
307
+ "with canlex_case."]
308
  return "\n".join(lines)
309
 
310
 
data/processed/caselaw.json ADDED
The diff for this file is too large to render. See raw diff