townboy/kpfbert-kdpii
Korean PII token-classification model fine-tuned from KPF/KPF-bert-ner on a KDPII-style dialogue dataset.
Dataset
- Source file:
์ฐ๋1_PII_dataset_V3.json - Documents:
4981 - Sentences:
53778 - Positive PII sentences:
19037 - Label count:
33
Training Setup
- Max length:
128 - Epochs:
4.0 - Learning rate:
2e-05 - Train batch size:
8 - Eval batch size:
8 - Device:
cuda - GPU:
NVIDIA GeForce RTX 4060 Ti - Mixed precision:
auto - Gradient checkpointing:
True
Intended Use
This model is intended for Korean personally identifiable information detection in dialogue-like text. Typical labels include names, nicknames, account numbers, mobile numbers, emails, addresses, IDs, and related sensitive entities.
Quick Inference
from transformers import pipeline
pipe = pipeline(
"token-classification",
model="townboy/kpfbert-kdpii",
aggregation_strategy="simple",
)
print(pipe("Phone 010-8661-5573, ID wanderingrabbit1"))
Notes
- The classification head is reinitialized for the KDPII label space.
- This checkpoint should be validated on your target product traffic before production use.
- Downloads last month
- 9
Model tree for townboy/kpfbert-kdpii
Base model
KPF/KPF-bert-ner