Symio-ai/legal-statute-parser

Model Description

Legal Statute Parser performs structured extraction from statute text. Given raw statute text, it identifies and labels: section numbers, subsections, definitions, operative clauses, penalty provisions, exceptions, effective dates, amendment history, and cross-references.

Designed to power the GLACIER pipeline's statute verification and element-mapping stages.

Intended Use

  • Primary: Parse statute text into structured components for element mapping
  • Secondary: Identify penalty ranges, damage caps, and procedural requirements from statute text
  • Integration: Feeds into legal-element-mapper and legal-damages-calculator models

Task Type

token-classification -- Named entity recognition for statutory components

Base Model

nlpaueb/legal-bert-base-uncased -- Pre-trained on legal corpora (EU legislation, US court opinions, contracts), strong baseline for legal NER tasks

Training Data

Source Records Description
US Code (all titles) ~60K sections Full federal statutory text with structure labels
Florida Statutes ~12K sections F.S. chapters relevant to litigation (chs. 768, 95, 92, 48, etc.)
Mississippi Code ~8K sections MS Code sections for active jurisdictions
State Legislature Archives ~200K sections Historical versions for amendment tracking

Entity Labels

  • SECTION_NUM -- Statute section identifier (e.g., 768.72)
  • SUBSECTION -- Subsection reference (e.g., (1)(a))
  • DEFINITION -- Defined term and its definition text
  • OPERATIVE -- The core operative language ("shall", "must", "is liable")
  • PENALTY -- Penalty or damages provision
  • EXCEPTION -- Exception or exclusion clause
  • EFFECTIVE_DATE -- Effective date of provision
  • XREF -- Cross-reference to another statute
  • ELEMENT -- Required element for a cause of action

Benchmark Criteria (90%+ Target)

Metric Target Description
Entity F1 >= 92% Macro F1 across all entity types
ELEMENT Recall >= 95% Must capture all cause-of-action elements
PENALTY Precision >= 93% Penalty provisions must be correctly identified
XREF Accuracy >= 90% Cross-references must resolve correctly
Latency < 500ms Per-section parse time

GLACIER Pipeline Integration

STAGE 2 (Research) --> statute-parser extracts elements from relevant statutes
STAGE 3 (WDC #1)  --> parsed elements fed to element-mapper for theory validation
STAGE 4 (Draft)    --> parsed penalty provisions inform damages calculator
STAGE 5 (WDC #2)  --> verify all statute references in draft are current

Training Configuration

  • Epochs: 10
  • Learning rate: 3e-5 with cosine schedule
  • Batch size: 16
  • Max sequence length: 512
  • Hardware: AWS SageMaker ml.g5.2xlarge

Limitations

  • Focused on FL, MS, and federal statutes; other state statutes may have lower accuracy
  • Archaic statutory language (pre-1970) may parse with lower confidence
  • Does not perform constitutional analysis or preemption checks
  • Assumes standard US statutory formatting conventions

Version History

Version Date Notes
v0.1 2026-04-10 Initial model card, repo created
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Symio-ai/legal-statute-parser

Finetuned
(100)
this model