Update README.md
Browse files
README.md
CHANGED
|
@@ -98,6 +98,27 @@ benign identifier-like text can be over-redacted. Precision-sensitive users
|
|
| 98 |
should add deterministic filters, tune thresholds where applicable, or
|
| 99 |
finetune on representative local negatives.
|
| 100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
## How To Use
|
| 102 |
|
| 103 |
> **Note on the classifier head.** This adapter ships a resized token-
|
|
@@ -130,6 +151,25 @@ uv run python main.py \
|
|
| 130 |
"Amina Yusuf can be reached at +234 802 111 3344."
|
| 131 |
```
|
| 132 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
### REST API
|
| 134 |
|
| 135 |
```bash
|
|
@@ -188,12 +228,7 @@ real-world domain samples. Direct identifiers and sensitive fields were
|
|
| 188 |
annotated and redacted from model-use fields. Source materials and derived
|
| 189 |
artifacts remain private and are not distributed.
|
| 190 |
|
| 191 |
-
|
| 192 |
-
`private_bvn`, `account_number`, `private_passport_number`,
|
| 193 |
-
`private_voters_card_number`, and `private_drivers_license_number`,
|
| 194 |
-
alongside generic PII labels: `private_person`, `private_email`,
|
| 195 |
-
`private_phone`, `private_address`, `private_date`, `private_url`, and
|
| 196 |
-
`secret`.
|
| 197 |
|
| 198 |
The committed public examples are **synthetic**. The private v5 mix is broader
|
| 199 |
and includes reviewed non-synthetic source material after direct identifiers
|
|
@@ -330,4 +365,4 @@ artifact commit in experiment reports so results are reproducible.
|
|
| 330 |
|
| 331 |
This adapter is released under the Apache License, Version 2.0. The base
|
| 332 |
model `openai/privacy-filter` is governed by its own license; consult the
|
| 333 |
-
upstream model card for terms.
|
|
|
|
| 98 |
should add deterministic filters, tune thresholds where applicable, or
|
| 99 |
finetune on representative local negatives.
|
| 100 |
|
| 101 |
+
## Supported Label Spans
|
| 102 |
+
|
| 103 |
+
The adapter emits these span labels. `O` is the background token label and is
|
| 104 |
+
not returned as a detected span.
|
| 105 |
+
|
| 106 |
+
| Label | Detects | Example |
|
| 107 |
+
| --- | --- | --- |
|
| 108 |
+
| `account_number` | Nigerian bank account/NUBAN-style account numbers when context indicates an account | `6318826391` |
|
| 109 |
+
| `private_address` | Street, city, state, or postal address spans tied to a person or record | `42 Unity Road, Ikeja, Lagos 100271` |
|
| 110 |
+
| `private_bvn` | Nigerian Bank Verification Number references and values | `22334455667` |
|
| 111 |
+
| `private_date` | Dates tied to a person, record, document, or event in a private workflow | `12 April 1988` |
|
| 112 |
+
| `private_drivers_license_number` | Nigerian driver license identifiers | `K2BHY7F6FEA0` |
|
| 113 |
+
| `private_email` | Email addresses | `amina.yusuf@example.ng` |
|
| 114 |
+
| `private_nin` | Nigerian National Identification Number references and values | `12345678901` |
|
| 115 |
+
| `private_passport_number` | Nigerian passport identifiers | `B05995318` |
|
| 116 |
+
| `private_person` | Person names and name-like references | `Amina Yusuf` |
|
| 117 |
+
| `private_phone` | Nigerian local and international phone-number formats | `+234 802 111 3344` |
|
| 118 |
+
| `private_url` | URLs tied to private records, claims, documents, or workflows | `https://claims.example/record/1234` |
|
| 119 |
+
| `private_voters_card_number` | Nigerian voter card identifiers | `ABCD 1234 5678 9012 345` |
|
| 120 |
+
| `secret` | Known-format credentials, authorization codes, session tokens, and similar secrets | `S3cure!9037Ops` |
|
| 121 |
+
|
| 122 |
## How To Use
|
| 123 |
|
| 124 |
> **Note on the classifier head.** This adapter ships a resized token-
|
|
|
|
| 151 |
"Amina Yusuf can be reached at +234 802 111 3344."
|
| 152 |
```
|
| 153 |
|
| 154 |
+
### Example result
|
| 155 |
+
|
| 156 |
+
For the adapter command above, the cleaned output should contain:
|
| 157 |
+
|
| 158 |
+
| Field | Value |
|
| 159 |
+
| --- | --- |
|
| 160 |
+
| Status | `PII detected` |
|
| 161 |
+
| Detected spans | `2` |
|
| 162 |
+
| Mode | `cleaned` |
|
| 163 |
+
| Adapter | `iamSamurai/privacy-filter-nigeria` |
|
| 164 |
+
|
| 165 |
+
| Label | Text | Start | End |
|
| 166 |
+
| --- | --- | ---: | ---: |
|
| 167 |
+
| `private_person` | `Amina Yusuf` | 0 | 11 |
|
| 168 |
+
| `private_phone` | `+234 802 111 3344` | 30 | 47 |
|
| 169 |
+
|
| 170 |
+
Confidence scores are model outputs and are not privacy, security, or
|
| 171 |
+
compliance guarantees.
|
| 172 |
+
|
| 173 |
### REST API
|
| 174 |
|
| 175 |
```bash
|
|
|
|
| 228 |
annotated and redacted from model-use fields. Source materials and derived
|
| 229 |
artifacts remain private and are not distributed.
|
| 230 |
|
| 231 |
+
Supported span labels are listed in [Supported Label Spans](#supported-label-spans).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 232 |
|
| 233 |
The committed public examples are **synthetic**. The private v5 mix is broader
|
| 234 |
and includes reviewed non-synthetic source material after direct identifiers
|
|
|
|
| 365 |
|
| 366 |
This adapter is released under the Apache License, Version 2.0. The base
|
| 367 |
model `openai/privacy-filter` is governed by its own license; consult the
|
| 368 |
+
upstream model card for terms.
|