ModernBERT-large Disfluency Detection — Exp D (Mixed 80/20)

Fine-tuned from answerdotai/ModernBERT-large on mixed data (80% synthetic / 20% real). Identical setup to Exp C (large real_only) to isolate the effect of synthetic data augmentation on a large model.

Dataset

  • Config: mixed_8020 de arielcerdap/disfluency-fluencybank
  • Train: 13713 segmentos (80% sint / 20% real)
  • Val/Test: idénticos a todos los experimentos anteriores

Comparison Table

Label Paper (BERT) Exp A (base) Exp C (large) Exp D (large+mix)
FP 1.000 0.9944 0.9944 0.9944
RP 0.690 0.8022 0.8964 0.7253
RV 0.400 0.3145 0.4974 0.3410
PW 0.830 0.8879 0.9451 0.9348
Macro 0.730 0.7497 0.8333 0.7489
Binary F1 0.8902 0.9250 0.8459

Per-class Detail

Label P R F1 Support
O 0.9795 0.9654 0.9724 3704
FP 0.9888 1.0000 0.9944 176
RP 0.6766 0.7816 0.7253 174
RV 0.2824 0.4302 0.3410 86
PW 0.9811 0.8927 0.9348 233

Hyperparameters

  • learning_rate: 5e-05
  • batch_size effective: 32 (8 × 4 grad_accum)
  • epochs: 15
  • warmup_steps: 963
  • weight_decay: 0.1
  • classifier_dropout: 0.3
  • focal_loss_gamma: 3.0 (adaptive)
  • class_weights: O=1.0, FP=3.0, RP=6.0, RV=20.0, PW=5.0
Downloads last month
11
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train arielcerdap/modernbert-disfluency-expD-large-mixed