Italian NER for Browser-Only PII Anonymization (BERT uncased, Quantized ONNX)

A lightweight Italian Named Entity Recognition model optimized for browser-only inference, based on:

osiria/bert-italian-uncased-ner
License: Apache-2.0
Original authors: Osiria

This repository provides a quantized ONNX version (~105 MB) suitable for running entirely in the browser via Transformers.js.


What this model is for

This model is designed as a privacy-friendly pre-filter layer to detect and help anonymize:

  • Person names (PER)
  • Organizations (ORG)
  • Locations (LOC)
  • Miscellaneous named entities (MISC)

Typical use case:

Run NER locally in the user's browser before sending text to an LLM, masking personal identifiers first.

All inference can run client-side.


Private Evaluation Protocol (Different Label Space)

The model predicts standard NER labels (PER/LOC/ORG/MISC/O), while the evaluation dataset is PII-oriented and uses a different schema.

To evaluate fairly, we use a private protocol (ner_compatible) that compares only compatible labels:

  • Gold labels are mapped to PER, LOC, or O.
  • Non-comparable PII classes are excluded from this score.
  • Predictions are projected token-level and evaluated with both BIO seqeval metrics and token-level type metrics.

Reproducible script with English comments:

  • evaluation/private_eval_procedure.py

Example:

python3 evaluation/private_eval_procedure.py \
  --validation-script ./validate.py \
  --mapping-mode ner_compatible \
  --max-examples 10197 \
  --debug 10 \
  --json-out ./evaluation/private_eval_latest.json

Why this version

Compared to the original PyTorch model:

  • Converted to ONNX
  • Dynamically quantized
  • Packaged for Transformers.js compatibility
  • Optimized for ONNX Runtime Web
  • Suitable for fully local browser execution

This repository is quantized-only:

  • includes onnx/model_quantized.onnx
  • does not include non-quantized ONNX weights

Use in the Browser (Transformers.js)

import { pipeline, env } from "@huggingface/transformers";

env.allowRemoteModels = true;
env.backends.onnx.wasm.simd = true;
env.backends.onnx.wasm.numThreads = 2;

const ner = await pipeline(
  "token-classification",
  "laibniz/italian-ner-pii-browser-uncased",
  {
    quantized: true,
    aggregation_strategy: "simple"
  }
);

const text = "Il paziente mario rossi vive a Milano.";
const entities = await ner(text);
console.log(entities);

All inference runs locally in the browser.


Attribution

This work builds upon:

osiria/bert-italian-uncased-ner
Apache-2.0 License
https://huggingface.co/osiria/bert-italian-uncased-ner

All credit for model training and dataset preparation belongs to the original authors.

This repository provides ONNX export and quantized packaging for browser use.

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Laibniz/italian-ner-pii-browser-uncased

Quantized
(1)
this model

Dataset used to train Laibniz/italian-ner-pii-browser-uncased

Space using Laibniz/italian-ner-pii-browser-uncased 1

Evaluation results