ModernBERT-large Disfluency Detection — Exp D (Mixed 80/20)
Fine-tuned from answerdotai/ModernBERT-large on mixed data
(80% synthetic / 20% real). Identical setup to Exp C (large real_only) to isolate the
effect of synthetic data augmentation on a large model.
Dataset
- Config:
mixed_8020 de arielcerdap/disfluency-fluencybank
- Train: 13713 segmentos (80% sint / 20% real)
- Val/Test: idénticos a todos los experimentos anteriores
Comparison Table
| Label |
Paper (BERT) |
Exp A (base) |
Exp C (large) |
Exp D (large+mix) |
| FP |
1.000 |
0.9944 |
0.9944 |
0.9944 |
| RP |
0.690 |
0.8022 |
0.8964 |
0.7253 |
| RV |
0.400 |
0.3145 |
0.4974 |
0.3410 |
| PW |
0.830 |
0.8879 |
0.9451 |
0.9348 |
| Macro |
0.730 |
0.7497 |
0.8333 |
0.7489 |
| Binary F1 |
— |
0.8902 |
0.9250 |
0.8459 |
Per-class Detail
| Label |
P |
R |
F1 |
Support |
| O |
0.9795 |
0.9654 |
0.9724 |
3704 |
| FP |
0.9888 |
1.0000 |
0.9944 |
176 |
| RP |
0.6766 |
0.7816 |
0.7253 |
174 |
| RV |
0.2824 |
0.4302 |
0.3410 |
86 |
| PW |
0.9811 |
0.8927 |
0.9348 |
233 |
Hyperparameters
- learning_rate: 5e-05
- batch_size effective: 32 (8 × 4 grad_accum)
- epochs: 15
- warmup_steps: 963
- weight_decay: 0.1
- classifier_dropout: 0.3
- focal_loss_gamma: 3.0 (adaptive)
- class_weights: O=1.0, FP=3.0, RP=6.0, RV=20.0, PW=5.0