# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("kwoncho/ko-sroberta-korean-time-expression-classifier")
model = AutoModelForTokenClassification.from_pretrained("kwoncho/ko-sroberta-korean-time-expression-classifier")Korean Time Expression Classifier
This model detects Korean TIMEX3 time expressions with BIO token classification labels.
The backbone is jhgan/ko-sroberta-multitask, fine-tuned on 158.시간 표현 탐지 데이터 for four TIMEX3 entity types:
DATETIMEDURATIONSET
Intended Use
Use this model to identify Korean time expressions in sentences or utterances. It predicts token-level BIO labels and can be used through the Hugging Face token-classification pipeline.
This is an experimental model trained for TIMEX3 span detection. It does not extract EVENT or TLINK annotations.
Training Data
The model was trained on the official Training split and evaluated on the official Validation split of 158.시간 표현 탐지 데이터.
Training/evaluation preprocessing:
- Unsupported, empty, malformed, or unalignable TIMEX3 spans are excluded.
- Records whose TIMEX3 span would be truncated by
max_length=256are excluded. - TIMEX-free records are retained as negative examples.
- JSON
textfields are used as the source text.
Training Configuration
python -m time_expression_classifier.train_token_classifier \
--data-root "158.시간 표현 탐지 데이터" \
--model-name jhgan/ko-sroberta-multitask \
--output-dir outputs/official_epoch2 \
--split-mode official \
--epochs 2 \
--learning-rate 3e-5 \
--batch-size 16 \
--max-length 256
Key settings:
| setting | value |
|---|---|
| backbone | jhgan/ko-sroberta-multitask |
| epochs | 2 |
| learning rate | 3e-5 |
| batch size | 16 |
| max length | 256 |
| weight decay | 0.01 |
| warmup ratio | 0.06 |
| seed | 42 |
Evaluation
Metrics are entity-level exact match on the official Validation split.
| metric | value |
|---|---|
| entity precision | 0.8265 |
| entity recall | 0.8268 |
| entity F1 | 0.8266 |
| token accuracy | 0.9899 |
| eval loss | 0.0350 |
Per-label entity-level results:
| label | precision | recall | F1 | support |
|---|---|---|---|---|
| DATE | 0.8495 | 0.8367 | 0.8430 | 23422 |
| TIME | 0.7933 | 0.8033 | 0.7983 | 3665 |
| DURATION | 0.7848 | 0.8247 | 0.8042 | 6810 |
| SET | 0.7107 | 0.6910 | 0.7007 | 974 |
Usage
from transformers import pipeline
tagger = pipeline(
"token-classification",
model="kwoncho/ko-sroberta-korean-time-expression-classifier",
aggregation_strategy="simple",
)
text = "매주 토요일 저녁에 회의를 합니다."
print(tagger(text))
Limitations
- The model is sensitive to ambiguous time expressions such as
주,하루,시간,한달,일주일, and매일. SETis the lowest-performing label due to smaller support and ambiguity between repeated events and duration expressions.- The model predicts TIMEX3 spans only. Normalization to calendar values is not included.
- Evaluation uses exact span match, so partial boundary differences count as errors.
Reproducibility
Repository: git@github.com:hyun2019/ko-sroberta-korean-time-expression-classifier.git
The local release artifact is tracked as models/official_epoch2 via DVC.
- Downloads last month
- -
Model tree for kwoncho/ko-sroberta-korean-time-expression-classifier
Base model
jhgan/ko-sroberta-multitaskEvaluation results
- Entity F1 on 158.시간 표현 탐지 데이터self-reported0.827
- Entity Precision on 158.시간 표현 탐지 데이터self-reported0.826
- Entity Recall on 158.시간 표현 탐지 데이터self-reported0.827
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="kwoncho/ko-sroberta-korean-time-expression-classifier")