Symio-ai/legal-pii-redactor

Model Description

Legal PII Redactor identifies and redacts personally identifiable information (PII) in legal documents while preserving legally necessary information. It distinguishes between PII that must be redacted (SSNs, financial account numbers, minor names) and PII that is legally required in filings (party names, addresses for service).

Implements court-mandated redaction requirements (FRCP 5.2, FL Rule 2.425) while ensuring filings remain valid.

Intended Use

  • Primary: Redact PII from legal documents before filing per court rules
  • Secondary: Prepare public versions of sealed or confidential documents
  • Integration: Post-processing step in GLACIER Stage 6 before filing

Task Type

token-classification -- Named entity recognition for PII categories with context-aware redaction decisions

Base Model

microsoft/deberta-v3-base -- Efficient inference for high-throughput document processing

Training Data

Source Records Description
Redacted Court Filings ~200K Filings with clerk-applied redactions (before/after pairs)
PII-Annotated Legal Docs ~100K Expert-annotated documents with PII labels
Court Redaction Orders ~20K Judicial orders specifying what to redact
FRCP 5.2 / Rule 2.425 Case Law ~5K opinions Rulings on redaction requirements
Synthetic PII Documents ~500K Generated documents with known PII for training

PII Categories and Redaction Rules

  • SSN -- Social Security Number --> ALWAYS redact (show last 4 only)
  • TAX_ID -- Taxpayer ID --> ALWAYS redact
  • FINANCIAL_ACCOUNT -- Bank/credit account numbers --> ALWAYS redact (last 4 only)
  • MINOR_NAME -- Name of minor child --> ALWAYS redact (use initials)
  • DOB_MINOR -- Date of birth of minor --> ALWAYS redact
  • HOME_ADDRESS -- Home address --> Redact unless needed for service
  • PHONE -- Phone number --> Redact unless in business context
  • EMAIL -- Email address --> Preserve if needed for certificate of service
  • MEDICAL -- Medical information --> Redact in public filings
  • PARTY_NAME -- Named party --> PRESERVE (required in caption)
  • ATTORNEY_INFO -- Attorney contact --> PRESERVE (required in filing)
  • CASE_NUMBER -- Case number --> PRESERVE
  • COURT_INFO -- Court identification --> PRESERVE

Redaction Format

Original: "John Smith, SSN 123-45-6789, residing at 123 Main St"
Redacted: "John Smith, SSN XXX-XX-6789, residing at [ADDRESS REDACTED]"

Benchmark Criteria (90%+ Target)

Metric Target Description
PII Detection Recall >= 98% Must catch nearly all PII
SSN/Financial Recall 100% Zero tolerance for missed financial PII
Minor Name Recall 100% Zero tolerance for exposed minor information
False Redaction Rate <= 2% Must not redact legally required information
Court Rule Compliance >= 95% Redaction matches applicable court rule
Throughput >= 50 pages/sec Fast enough for bulk document processing

GLACIER Pipeline Integration

STAGE 6 (Final Draft) --> pii-redactor processes document before filing
  Input: final document text
  Output: redacted version + redaction log
  Redaction log shows: what was redacted, which rule required it, original value (encrypted)

Court Rule Mapping:

  • Federal: FRCP 5.2 (SSN, TIN, DOB of minors, financial accounts, minor names)
  • Florida: Rule 2.425 (broader than federal -- includes home addresses)
  • Mississippi: MRCP (follows federal standards)

Training Configuration

  • Epochs: 10
  • Learning rate: 3e-5
  • Batch size: 32
  • Max sequence length: 512
  • Hardware: AWS SageMaker ml.g5.2xlarge

Limitations

  • Context-dependent redaction decisions (e.g., when is an address needed for service?) require case-specific context
  • Handwritten or poorly OCR'd documents may have lower PII detection rates
  • Novel PII types (cryptocurrency addresses, biometric data) are less represented
  • Does not handle image-based PII redaction (photos, scanned signatures)
  • Sealed document handling requires additional judicial order analysis

Version History

Version Date Notes
v0.1 2026-04-10 Initial model card, repo created
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Symio-ai/legal-pii-redactor

Finetuned
(590)
this model