ModernBERT-large Disfluency โ€” Exp E (Real + RV Oversample x2)

Fine-tuned from answerdotai/ModernBERT-large. Train set = real data with RV-containing segments oversampled 2x. No synthetic data. Val/Test identical to all experiments.

Oversample stats

  • Train original : 2737 segments
  • RV segments : 348 (duplicated 1 extra time(s))
  • Train final : 3085 segments
  • RV tokens after: 1858 (4.21%)

Comparison Table

Label Paper Exp A (base) Exp C (large) Exp D (large+mix) Exp E (large+oversample)
FP 1.000 0.9944 0.9944 0.9944 0.9915
RP 0.690 0.8022 0.8964 0.7253 0.8802
RV 0.400 0.3145 0.4974 0.3410 0.4884
PW 0.830 0.8879 0.9451 0.9348 0.9409
Macro 0.730 0.7497 0.8333 0.7489 0.8253
Binary F1 โ€” 0.8902 0.9250 0.8459 0.9344

Per-class Detail

Label P R F1 Support
O 0.9895 0.9881 0.9888 3704
FP 0.9832 1.0000 0.9915 176
RP 0.8541 0.9080 0.8802 174
RV 0.4884 0.4884 0.4884 86
PW 0.9598 0.9227 0.9409 233

Hyperparameters

  • learning_rate: 5e-05
  • batch_size effective: 32 (8 ร— 4 grad_accum)
  • epochs: 15
  • warmup_steps: 97
  • weight_decay: 0.1
  • classifier_dropout: 0.3
  • focal_loss_gamma: 3.0 (adaptive)
  • class_weights: O=1.0, FP=3.0, RP=6.0, RV=12.0, PW=5.0
  • oversample_factor: 2x RV segments
Downloads last month
12
Safetensors
Model size
0.4B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train arielcerdap/modernbert-disfluency-expE-large-oversample-rv