townboy/kpfbert-kdpii

Korean PII token-classification model fine-tuned from KPF/KPF-bert-ner on a KDPII-style dialogue dataset.

Dataset

  • Source file: ์—ฐ๋Œ€1_PII_dataset_V3.json
  • Documents: 4981
  • Sentences: 53778
  • Positive PII sentences: 19037
  • Label count: 33

Training Setup

  • Max length: 128
  • Epochs: 4.0
  • Learning rate: 2e-05
  • Train batch size: 8
  • Eval batch size: 8
  • Device: cuda
  • GPU: NVIDIA GeForce RTX 4060 Ti
  • Mixed precision: auto
  • Gradient checkpointing: True

Intended Use

This model is intended for Korean personally identifiable information detection in dialogue-like text. Typical labels include names, nicknames, account numbers, mobile numbers, emails, addresses, IDs, and related sensitive entities.

Quick Inference

from transformers import pipeline

pipe = pipeline(
    "token-classification",
    model="townboy/kpfbert-kdpii",
    aggregation_strategy="simple",
)

print(pipe("Phone 010-8661-5573, ID wanderingrabbit1"))

Notes

  • The classification head is reinitialized for the KDPII label space.
  • This checkpoint should be validated on your target product traffic before production use.
Downloads last month
9
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for townboy/kpfbert-kdpii

Base model

KPF/KPF-bert-ner
Finetuned
(1)
this model