iamSamurai commited on
Commit
ac73356
·
verified ·
1 Parent(s): ec8f14c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -7
README.md CHANGED
@@ -98,6 +98,27 @@ benign identifier-like text can be over-redacted. Precision-sensitive users
98
  should add deterministic filters, tune thresholds where applicable, or
99
  finetune on representative local negatives.
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ## How To Use
102
 
103
  > **Note on the classifier head.** This adapter ships a resized token-
@@ -130,6 +151,25 @@ uv run python main.py \
130
  "Amina Yusuf can be reached at +234 802 111 3344."
131
  ```
132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  ### REST API
134
 
135
  ```bash
@@ -188,12 +228,7 @@ real-world domain samples. Direct identifiers and sensitive fields were
188
  annotated and redacted from model-use fields. Source materials and derived
189
  artifacts remain private and are not distributed.
190
 
191
- Label space includes Nigerian-domain PII types such as `private_nin`,
192
- `private_bvn`, `account_number`, `private_passport_number`,
193
- `private_voters_card_number`, and `private_drivers_license_number`,
194
- alongside generic PII labels: `private_person`, `private_email`,
195
- `private_phone`, `private_address`, `private_date`, `private_url`, and
196
- `secret`.
197
 
198
  The committed public examples are **synthetic**. The private v5 mix is broader
199
  and includes reviewed non-synthetic source material after direct identifiers
@@ -330,4 +365,4 @@ artifact commit in experiment reports so results are reproducible.
330
 
331
  This adapter is released under the Apache License, Version 2.0. The base
332
  model `openai/privacy-filter` is governed by its own license; consult the
333
- upstream model card for terms.
 
98
  should add deterministic filters, tune thresholds where applicable, or
99
  finetune on representative local negatives.
100
 
101
+ ## Supported Label Spans
102
+
103
+ The adapter emits these span labels. `O` is the background token label and is
104
+ not returned as a detected span.
105
+
106
+ | Label | Detects | Example |
107
+ | --- | --- | --- |
108
+ | `account_number` | Nigerian bank account/NUBAN-style account numbers when context indicates an account | `6318826391` |
109
+ | `private_address` | Street, city, state, or postal address spans tied to a person or record | `42 Unity Road, Ikeja, Lagos 100271` |
110
+ | `private_bvn` | Nigerian Bank Verification Number references and values | `22334455667` |
111
+ | `private_date` | Dates tied to a person, record, document, or event in a private workflow | `12 April 1988` |
112
+ | `private_drivers_license_number` | Nigerian driver license identifiers | `K2BHY7F6FEA0` |
113
+ | `private_email` | Email addresses | `amina.yusuf@example.ng` |
114
+ | `private_nin` | Nigerian National Identification Number references and values | `12345678901` |
115
+ | `private_passport_number` | Nigerian passport identifiers | `B05995318` |
116
+ | `private_person` | Person names and name-like references | `Amina Yusuf` |
117
+ | `private_phone` | Nigerian local and international phone-number formats | `+234 802 111 3344` |
118
+ | `private_url` | URLs tied to private records, claims, documents, or workflows | `https://claims.example/record/1234` |
119
+ | `private_voters_card_number` | Nigerian voter card identifiers | `ABCD 1234 5678 9012 345` |
120
+ | `secret` | Known-format credentials, authorization codes, session tokens, and similar secrets | `S3cure!9037Ops` |
121
+
122
  ## How To Use
123
 
124
  > **Note on the classifier head.** This adapter ships a resized token-
 
151
  "Amina Yusuf can be reached at +234 802 111 3344."
152
  ```
153
 
154
+ ### Example result
155
+
156
+ For the adapter command above, the cleaned output should contain:
157
+
158
+ | Field | Value |
159
+ | --- | --- |
160
+ | Status | `PII detected` |
161
+ | Detected spans | `2` |
162
+ | Mode | `cleaned` |
163
+ | Adapter | `iamSamurai/privacy-filter-nigeria` |
164
+
165
+ | Label | Text | Start | End |
166
+ | --- | --- | ---: | ---: |
167
+ | `private_person` | `Amina Yusuf` | 0 | 11 |
168
+ | `private_phone` | `+234 802 111 3344` | 30 | 47 |
169
+
170
+ Confidence scores are model outputs and are not privacy, security, or
171
+ compliance guarantees.
172
+
173
  ### REST API
174
 
175
  ```bash
 
228
  annotated and redacted from model-use fields. Source materials and derived
229
  artifacts remain private and are not distributed.
230
 
231
+ Supported span labels are listed in [Supported Label Spans](#supported-label-spans).
 
 
 
 
 
232
 
233
  The committed public examples are **synthetic**. The private v5 mix is broader
234
  and includes reviewed non-synthetic source material after direct identifiers
 
365
 
366
  This adapter is released under the Apache License, Version 2.0. The base
367
  model `openai/privacy-filter` is governed by its own license; consult the
368
+ upstream model card for terms.