How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("token-classification", model="kwoncho/ko-sroberta-korean-time-expression-classifier")
# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("kwoncho/ko-sroberta-korean-time-expression-classifier")
model = AutoModelForTokenClassification.from_pretrained("kwoncho/ko-sroberta-korean-time-expression-classifier")
Quick Links

Korean Time Expression Classifier

This model detects Korean TIMEX3 time expressions with BIO token classification labels.

The backbone is jhgan/ko-sroberta-multitask, fine-tuned on 158.시간 표현 탐지 데이터 for four TIMEX3 entity types:

  • DATE
  • TIME
  • DURATION
  • SET

Intended Use

Use this model to identify Korean time expressions in sentences or utterances. It predicts token-level BIO labels and can be used through the Hugging Face token-classification pipeline.

This is an experimental model trained for TIMEX3 span detection. It does not extract EVENT or TLINK annotations.

Training Data

The model was trained on the official Training split and evaluated on the official Validation split of 158.시간 표현 탐지 데이터.

Training/evaluation preprocessing:

  • Unsupported, empty, malformed, or unalignable TIMEX3 spans are excluded.
  • Records whose TIMEX3 span would be truncated by max_length=256 are excluded.
  • TIMEX-free records are retained as negative examples.
  • JSON text fields are used as the source text.

Training Configuration

python -m time_expression_classifier.train_token_classifier \
  --data-root "158.시간 표현 탐지 데이터" \
  --model-name jhgan/ko-sroberta-multitask \
  --output-dir outputs/official_epoch2 \
  --split-mode official \
  --epochs 2 \
  --learning-rate 3e-5 \
  --batch-size 16 \
  --max-length 256

Key settings:

setting value
backbone jhgan/ko-sroberta-multitask
epochs 2
learning rate 3e-5
batch size 16
max length 256
weight decay 0.01
warmup ratio 0.06
seed 42

Evaluation

Metrics are entity-level exact match on the official Validation split.

metric value
entity precision 0.8265
entity recall 0.8268
entity F1 0.8266
token accuracy 0.9899
eval loss 0.0350

Per-label entity-level results:

label precision recall F1 support
DATE 0.8495 0.8367 0.8430 23422
TIME 0.7933 0.8033 0.7983 3665
DURATION 0.7848 0.8247 0.8042 6810
SET 0.7107 0.6910 0.7007 974

Usage

from transformers import pipeline

tagger = pipeline(
    "token-classification",
    model="kwoncho/ko-sroberta-korean-time-expression-classifier",
    aggregation_strategy="simple",
)

text = "매주 토요일 저녁에 회의를 합니다."
print(tagger(text))

Limitations

  • The model is sensitive to ambiguous time expressions such as , 하루, 시간, 한달, 일주일, and 매일.
  • SET is the lowest-performing label due to smaller support and ambiguity between repeated events and duration expressions.
  • The model predicts TIMEX3 spans only. Normalization to calendar values is not included.
  • Evaluation uses exact span match, so partial boundary differences count as errors.

Reproducibility

Repository: git@github.com:hyun2019/ko-sroberta-korean-time-expression-classifier.git

The local release artifact is tracked as models/official_epoch2 via DVC.

Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kwoncho/ko-sroberta-korean-time-expression-classifier

Finetuned
(9)
this model

Evaluation results

  • Entity F1 on 158.시간 표현 탐지 데이터
    self-reported
    0.827
  • Entity Precision on 158.시간 표현 탐지 데이터
    self-reported
    0.826
  • Entity Recall on 158.시간 표현 탐지 데이터
    self-reported
    0.827