nllb-sango-finetuned-600m-3ep

This model is a fine-tuned version of facebook/nllb-200-distilled-600M on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3012

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
15.0623 0.1146 1000 1.7556
14.4898 0.2291 2000 1.6745
14.0211 0.3437 3000 1.6137
13.6606 0.4583 4000 1.5683
13.4614 0.5728 5000 1.5280
13.1058 0.6874 6000 1.4951
13.0098 0.8019 7000 1.4680
12.8789 0.9165 8000 1.4467
12.4444 1.0310 9000 1.4260
12.6331 1.1456 10000 1.4081
12.3781 1.2602 11000 1.3935
12.1731 1.3747 12000 1.3782
12.2067 1.4893 13000 1.3670
12.0454 1.6039 14000 1.3578
11.9698 1.7184 15000 1.3479
11.9595 1.8330 16000 1.3398
11.8094 1.9476 17000 1.3327
11.6344 2.0621 18000 1.3272
11.7943 2.1767 19000 1.3200
11.7281 2.2912 20000 1.3155
11.8180 2.4058 21000 1.3110
11.5384 2.5203 22000 1.3079
11.7371 2.6349 23000 1.3051
11.6320 2.7495 24000 1.3031
11.7702 2.8640 25000 1.3018
11.6175 2.9786 26000 1.3012
11.6356 3.0 26187 1.3012

Framework versions

  • PEFT 0.19.1
  • Transformers 5.8.1
  • Pytorch 2.12.0+cu126
  • Datasets 4.8.5
  • Tokenizers 0.22.2
Downloads last month
559
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MEYNG/nllb-sango-finetuned-600m-3ep

Adapter
(92)
this model