Symio-ai/legal-statute-parser
Model Description
Legal Statute Parser performs structured extraction from statute text. Given raw statute text, it identifies and labels: section numbers, subsections, definitions, operative clauses, penalty provisions, exceptions, effective dates, amendment history, and cross-references.
Designed to power the GLACIER pipeline's statute verification and element-mapping stages.
Intended Use
- Primary: Parse statute text into structured components for element mapping
- Secondary: Identify penalty ranges, damage caps, and procedural requirements from statute text
- Integration: Feeds into
legal-element-mapperandlegal-damages-calculatormodels
Task Type
token-classification -- Named entity recognition for statutory components
Base Model
nlpaueb/legal-bert-base-uncased -- Pre-trained on legal corpora (EU legislation, US court opinions, contracts), strong baseline for legal NER tasks
Training Data
| Source | Records | Description |
|---|---|---|
| US Code (all titles) | ~60K sections | Full federal statutory text with structure labels |
| Florida Statutes | ~12K sections | F.S. chapters relevant to litigation (chs. 768, 95, 92, 48, etc.) |
| Mississippi Code | ~8K sections | MS Code sections for active jurisdictions |
| State Legislature Archives | ~200K sections | Historical versions for amendment tracking |
Entity Labels
SECTION_NUM-- Statute section identifier (e.g., 768.72)SUBSECTION-- Subsection reference (e.g., (1)(a))DEFINITION-- Defined term and its definition textOPERATIVE-- The core operative language ("shall", "must", "is liable")PENALTY-- Penalty or damages provisionEXCEPTION-- Exception or exclusion clauseEFFECTIVE_DATE-- Effective date of provisionXREF-- Cross-reference to another statuteELEMENT-- Required element for a cause of action
Benchmark Criteria (90%+ Target)
| Metric | Target | Description |
|---|---|---|
| Entity F1 | >= 92% | Macro F1 across all entity types |
| ELEMENT Recall | >= 95% | Must capture all cause-of-action elements |
| PENALTY Precision | >= 93% | Penalty provisions must be correctly identified |
| XREF Accuracy | >= 90% | Cross-references must resolve correctly |
| Latency | < 500ms | Per-section parse time |
GLACIER Pipeline Integration
STAGE 2 (Research) --> statute-parser extracts elements from relevant statutes
STAGE 3 (WDC #1) --> parsed elements fed to element-mapper for theory validation
STAGE 4 (Draft) --> parsed penalty provisions inform damages calculator
STAGE 5 (WDC #2) --> verify all statute references in draft are current
Training Configuration
- Epochs: 10
- Learning rate: 3e-5 with cosine schedule
- Batch size: 16
- Max sequence length: 512
- Hardware: AWS SageMaker ml.g5.2xlarge
Limitations
- Focused on FL, MS, and federal statutes; other state statutes may have lower accuracy
- Archaic statutory language (pre-1970) may parse with lower confidence
- Does not perform constitutional analysis or preemption checks
- Assumes standard US statutory formatting conventions
Version History
| Version | Date | Notes |
|---|---|---|
| v0.1 | 2026-04-10 | Initial model card, repo created |
Model tree for Symio-ai/legal-statute-parser
Base model
nlpaueb/legal-bert-base-uncased