Symio-ai/legal-statute-parser

Model Description

Legal Statute Parser performs structured extraction from statute text. Given raw statute text, it identifies and labels: section numbers, subsections, definitions, operative clauses, penalty provisions, exceptions, effective dates, amendment history, and cross-references.

Designed to power the GLACIER pipeline's statute verification and element-mapping stages.

Intended Use

Primary: Parse statute text into structured components for element mapping
Secondary: Identify penalty ranges, damage caps, and procedural requirements from statute text
Integration: Feeds into legal-element-mapper and legal-damages-calculator models

Task Type

token-classification -- Named entity recognition for statutory components

Base Model

nlpaueb/legal-bert-base-uncased -- Pre-trained on legal corpora (EU legislation, US court opinions, contracts), strong baseline for legal NER tasks

Training Data

Source	Records	Description
US Code (all titles)	~60K sections	Full federal statutory text with structure labels
Florida Statutes	~12K sections	F.S. chapters relevant to litigation (chs. 768, 95, 92, 48, etc.)
Mississippi Code	~8K sections	MS Code sections for active jurisdictions
State Legislature Archives	~200K sections	Historical versions for amendment tracking

Entity Labels

SECTION_NUM -- Statute section identifier (e.g., 768.72)
SUBSECTION -- Subsection reference (e.g., (1)(a))
DEFINITION -- Defined term and its definition text
OPERATIVE -- The core operative language ("shall", "must", "is liable")
PENALTY -- Penalty or damages provision
EXCEPTION -- Exception or exclusion clause
EFFECTIVE_DATE -- Effective date of provision
XREF -- Cross-reference to another statute
ELEMENT -- Required element for a cause of action

Benchmark Criteria (90%+ Target)

Metric	Target	Description
Entity F1	>= 92%	Macro F1 across all entity types
ELEMENT Recall	>= 95%	Must capture all cause-of-action elements
PENALTY Precision	>= 93%	Penalty provisions must be correctly identified
XREF Accuracy	>= 90%	Cross-references must resolve correctly
Latency	< 500ms	Per-section parse time

GLACIER Pipeline Integration

STAGE 2 (Research) --> statute-parser extracts elements from relevant statutes
STAGE 3 (WDC #1)  --> parsed elements fed to element-mapper for theory validation
STAGE 4 (Draft)    --> parsed penalty provisions inform damages calculator
STAGE 5 (WDC #2)  --> verify all statute references in draft are current

Training Configuration

Epochs: 10
Learning rate: 3e-5 with cosine schedule
Batch size: 16
Max sequence length: 512
Hardware: AWS SageMaker ml.g5.2xlarge

Limitations

Focused on FL, MS, and federal statutes; other state statutes may have lower accuracy
Archaic statutory language (pre-1970) may parse with lower confidence
Does not perform constitutional analysis or preemption checks
Assumes standard US statutory formatting conventions

Version History

Version	Date	Notes
v0.1	2026-04-10	Initial model card, repo created

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Symio-ai/legal-statute-parser

Base model

nlpaueb/legal-bert-base-uncased

Finetuned

(100)

this model