---
license: apache-2.0
language:
- en
pipeline_tag: token-classification
library_name: transformers
tags:
- pii
- privacy
- token-classification
- bioes
- moe
- haremb
base_model:
- OpenMed/privacy-filter-nemotron
- openai/privacy-filter
datasets:
- nvidia/Nemotron-PII
---

# HarEmb · OpenMed-Nemotron PII

> A **single-layer** HarEmb model on the [`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron) lineage. It has 287M total parameters and predicts the full **221-class BIOES** Nemotron-PII label space.

**Model**: [`fblgit/haremb-privacy-filter-opennemo`](https://huggingface.co/fblgit/haremb-privacy-filter-opennemo)

![HarEmb architecture](haremb.png)

## Lineage

This model is the third leg of a three-step lineage:

1. **[`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)** — OpenAI's open release of the underlying 1.4B-parameter MoE backbone (8 transformer layers, ~50M active params/token, BIOES token classifier head).
2. **[`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron)** — OpenMed's full fine-tune of that backbone on `nvidia/Nemotron-PII`, expanding the head to 221 BIOES classes (55 fine-grained PII categories).
3. **`haremb-privacy-filter-opennemo`** *(this model)* — a one-layer surgical slice of the OpenMed teacher. 

## What this model does

Token-level PII classification over **55 Nemotron-PII categories**. Every token receives one of `O` or `{B, I, E, S}-<category>`, covering identity, contact, address, date/time, government ID, financial, healthcare, enterprise ID, vehicle, and digital identifier categories.

In `eval()` mode the model can run constrained-BIOES Viterbi decoding internally, so `outputs.logits.argmax(-1)` is span-coherent by default. See [Output semantics](#output-semantics) for the exact fields and opt-out flags.

## Evaluation

Evaluated on a 1% slice of `nvidia/Nemotron-PII:test` (1,000 documents, ctx 1024, seed 42), Viterbi-decoded. The benchmark and app both use the convention **A = `OpenMed/privacy-filter-nemotron` (teacher / baseline)**, **B = this checkpoint** (`haremb`); ratios are reported as **B ÷ A**.

### Quality (viterbi stream)

| metric | **A: OpenMed teacher** | **B: haremb** (this) | B − A |
|---|---:|---:|---:|
| span F1 | 0.9434 | **0.9288** | −0.0146 |
| span precision | 0.9531 | **0.9396** | −0.0135 |
| span recall | 0.9338 | **0.9182** | −0.0156 |
| token accuracy | 0.9900 | **0.9885** | −0.0015 |
| non-O recall | 0.9703 | **0.9637** | −0.0066 |

### Performance (same eval set, ctx 1024, bf16, single GPU)

| metric | **A: OpenMed teacher** | **B: haremb** | B vs A |
|---|---:|---:|---:|
| total params | 1,400M | **287M** | **4.87× smaller** |
| dense params | 139M | 130M | 1.07× smaller |
| MoE expert params | 1,260M | 158M | **7.97× smaller** |
| **active params / token** (memory) | 178.7M | **134.5M** | 1.33× smaller |
| **compute params / token** (FLOPs) | 50.7M | **6.5M** | **7.85× cheaper** |
| GFLOP / token (forward) | 0.101 | **0.013** | **7.85× cheaper** |
| weights on disk | (HF repo) | **548 MiB** | — |
| weights in RAM | 2,669 MiB | 548 MiB | **4.87× smaller** |
| peak GPU memory (eval) | 3.30 GiB | **1.22 GiB** | **2.70× less** |
| throughput | 3,275 tok/s | **6,361 tok/s** | **1.94× faster** |

`active params / token` estimates memory bandwidth pressure, while `compute params / token` estimates matmul FLOPs and excludes the embedding table row-gather. GFLOP/token is `2 × compute_params_per_token`. `infer.log` and `compare.log` contain the full breakdown, including peak GPU memory from `torch.cuda.max_memory_allocated`.

![Performance profile — absolute footprint and B/A ratio, A teacher vs B candidate](eval_performance.png)

### Quality breakdown

![Eval summary — headline metrics, raw-vs-viterbi span F1, and selected per-category deltas](eval_summary.png)

### Per-category highlights (viterbi span F1)

**At or near 1.000 (B)** — `biometric_identifier`, `blood_type`, `coordinate`, `health_plan_beneficiary_number`, `ipv4`, `ipv6`, `license_plate`, `mac_address`, `national_id`, `postcode` (≥ 0.99 with ≥ 100 gold spans).

**Categories where B beats A** — `gender` (0.987 vs 0.841), `political_view` (0.872 vs 0.839), `religious_belief` (0.935 vs 0.926), `state` (0.908 vs 0.829), `language` (0.897 vs 0.804), `race_ethnicity` (0.864 vs 0.861), `country` (0.952 vs 0.936). Several "fuzzy" world-knowledge categories where the 1-layer student carries the right inductive bias.

**Categories where A leads** — `occupation` (0.727 vs 0.605), `company_name` (0.929 vs 0.776), `last_name` (0.976 vs 0.931), `first_name` (0.970 vs 0.930), `user_name` (0.961 vs 0.942). Identity-noun categories where the teacher's deeper-layer mixing helps.

### Token-outcome breakdown — A: OpenMed teacher vs B: haremb (viterbi)

![Pairwise token outcome and net category wins on gold non-O tokens](eval_confusion.png)

## Quick start

### Recommended — via OpenMed

The OpenMed wrapper is the same UX the teacher card recommends and works on this checkpoint as a drop-in:

```bash
pip install -U "openmed[hf]"
```

```python
from openmed import extract_pii, deidentify

text = (
    "Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, "
    "phone 415-555-0123, email sarah.johnson@example.com."
)

result = extract_pii(text, model_name="fblgit/haremb-privacy-filter-opennemo")
for ent in result.entities:
    print(f"{ent.label:30s} {ent.text!r}  conf={ent.confidence:.2f}")

masked = deidentify(text, method="mask",
                    model_name="fblgit/haremb-privacy-filter-opennemo")
fake = deidentify(text, method="replace",
                  model_name="fblgit/haremb-privacy-filter-opennemo",
                  consistent=True, seed=42)
```

### HuggingFace `transformers` pipeline

```python
from transformers import pipeline

pipe = pipeline(
    "token-classification",
    model="fblgit/haremb-privacy-filter-opennemo",
    tokenizer="fblgit/haremb-privacy-filter-opennemo",
    trust_remote_code=True,
    aggregation_strategy="simple",
)

pipe("Send the invoice to billing@acmecorp.io, account 1234-5678.")
# → [{'entity_group': 'email',           'word': 'billing@acmecorp.io', ...},
#    {'entity_group': 'account_number',  'word': '1234-5678',           ...}]
```

### Raw `transformers` API

```python
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer

repo = "fblgit/haremb-privacy-filter-opennemo"
model = AutoModelForTokenClassification.from_pretrained(
    repo, trust_remote_code=True, dtype=torch.bfloat16,
).to("cuda").eval()
tok = AutoTokenizer.from_pretrained(repo)

enc = tok("My email is foo@bar.com.", return_tensors="pt").to("cuda")
with torch.no_grad():
    out = model(**enc)

# By default, `outputs.logits.argmax(-1)` follows the Viterbi-decoded path.
labels = out.logits.argmax(-1)[0]
```

## Output semantics

The forward pass — in `eval()` mode — runs constrained-BIOES Viterbi over the per-token logits and attaches three things to the output:

- `outputs.logits` — a tensor whose `argmax(-1)` equals the Viterbi prediction (so HF `pipeline()` and naive `argmax` consumers get span-coherent predictions automatically).
- `outputs.predicted_labels` — a `[B, T]` LongTensor of Viterbi-decoded label ids (`-1` at padded positions).
- `outputs.raw_logits` — the original per-token logits, preserved for callers that want raw confidences.

To opt out:

```python
model.config.viterbi_replace_logits = False   # raw logits in outputs.logits
model.config.use_viterbi_decode = False       # also skip Viterbi entirely
```

The model supports the upstream context length (max position embeddings 131,072 tokens). Practical batch sizes depend on hardware; bf16 + batch 1 + full-length is comfortable on 24 GB.

## Limitations & intended use

- **English-only training data.** Nemotron-PII is predominantly English. Performance on non-English text is not guaranteed.
- **Synthetic training data.** Real clinical notes, legal documents, and live web text may show different surface forms. For high-stakes deployments, collect a domain-specific eval set and re-calibrate.
- **Fuzzier categories** — `occupation`, `company_name`, and identity nouns (`first_name`, `last_name`, `user_name`) carry more uncertainty than formatted identifiers; downstream pipelines that only need strict PII can ignore low-confidence predictions on these.
- **Not a substitute for legal compliance review.** Use alongside a governance layer (human review, deterministic regex pre-filters, etc.).

## Reproducibility

Every metric, log, and plot in this card is regenerated by the single-file [`benchmark.py`](benchmark.py) shipped alongside the weights:

```bash
python benchmark.py                 # full benchmark vs OpenMed teacher
python benchmark.py --no-base       # skip teacher download (logs only)
python benchmark.py --no-plots      # skip matplotlib (logs + JSON only)
python benchmark.py --eval-pct 0.1  # smaller slice for a quick check
```

Outputs into the model folder:

- `infer.log`
- `compare.log`
- `eval_summary.png`
- `eval_confusion.png`
- `eval_performance.png`

Raw per-doc eval data is held in memory only. Pass `--out` to write artifacts somewhere else.

The Gradio demo in [`app.py`](app.py) supports **side-by-side A-vs-B comparison** between any two token-classification checkpoints with the same label space. Defaults match the report convention: **A = OpenMed/privacy-filter-nemotron** (teacher / baseline), **B = this checkpoint**. Disable either model to run single-model inference; both expose a runtime "active experts per token" slider so you can sweep MoE routing density. From inside the model folder:

```bash
python app.py                                   # A=OpenMed teacher, B=. (this)
python app.py --model-a /path/to/another/repo   # swap baseline A
python app.py --model-b /path/to/another/repo   # swap candidate B
python app.py --port 7860 --share               # public share link
```

## License

Apache-2.0, same as the lineage. Subject to the license terms of [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter) and the dataset terms of [`nvidia/Nemotron-PII`](https://huggingface.co/datasets/nvidia/Nemotron-PII).

## Citation

```bibtex
@misc{haremb-privacy-filter-opennemo,
  title        = {HarEmb · OpenMed-Nemotron PII: a single-layer
                  privacy-filter slice with span-coherent inference},
  author       = {fblgit},
  year         = {2026},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/fblgit/haremb-privacy-filter-opennemo},
  howpublished = {\url{https://huggingface.co/fblgit/haremb-privacy-filter-opennemo}},
  note         = {Single-transformer-layer model on the openai/privacy-filter →
                  OpenMed/privacy-filter-nemotron lineage; 287M total params,
                  221 BIOES classes (55 fine-grained PII categories), with
                  inlined constrained-BIOES Viterbi decoding so
                  outputs.logits.argmax(-1) is span-coherent.}
}

@misc{openmed-privacy-filter-nemotron,
  title     = {OpenMed/privacy-filter-nemotron: fine-grained PII extraction
               with 55 categories},
  author    = {OpenMed},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/OpenMed/privacy-filter-nemotron}
}

@misc{openai-privacy-filter,
  title     = {Privacy Filter},
  author    = {OpenAI},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/openai/privacy-filter}
}

@misc{nvidia-nemotron-pii,
  title     = {Nemotron-PII},
  author    = {NVIDIA},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-PII}
}
```