ModernBERT-large Disfluency โ Exp E (Real + RV Oversample x2)
Fine-tuned from answerdotai/ModernBERT-large.
Train set = real data with RV-containing segments oversampled 2x.
No synthetic data. Val/Test identical to all experiments.
Oversample stats
- Train original : 2737 segments
- RV segments : 348 (duplicated 1 extra time(s))
- Train final : 3085 segments
- RV tokens after: 1858 (4.21%)
Comparison Table
| Label |
Paper |
Exp A (base) |
Exp C (large) |
Exp D (large+mix) |
Exp E (large+oversample) |
| FP |
1.000 |
0.9944 |
0.9944 |
0.9944 |
0.9915 |
| RP |
0.690 |
0.8022 |
0.8964 |
0.7253 |
0.8802 |
| RV |
0.400 |
0.3145 |
0.4974 |
0.3410 |
0.4884 |
| PW |
0.830 |
0.8879 |
0.9451 |
0.9348 |
0.9409 |
| Macro |
0.730 |
0.7497 |
0.8333 |
0.7489 |
0.8253 |
| Binary F1 |
โ |
0.8902 |
0.9250 |
0.8459 |
0.9344 |
Per-class Detail
| Label |
P |
R |
F1 |
Support |
| O |
0.9895 |
0.9881 |
0.9888 |
3704 |
| FP |
0.9832 |
1.0000 |
0.9915 |
176 |
| RP |
0.8541 |
0.9080 |
0.8802 |
174 |
| RV |
0.4884 |
0.4884 |
0.4884 |
86 |
| PW |
0.9598 |
0.9227 |
0.9409 |
233 |
Hyperparameters
- learning_rate: 5e-05
- batch_size effective: 32 (8 ร 4 grad_accum)
- epochs: 15
- warmup_steps: 97
- weight_decay: 0.1
- classifier_dropout: 0.3
- focal_loss_gamma: 3.0 (adaptive)
- class_weights: O=1.0, FP=3.0, RP=6.0, RV=12.0, PW=5.0
- oversample_factor: 2x RV segments