You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

gpt2moe_hom2_100mb

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.0388

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 7506
  • training_steps: 75067
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 11.0818
6.854 0.2664 2000 6.5520
6.0315 0.5329 4000 5.7265
5.5806 0.7993 6000 5.2642
5.2507 1.0657 8000 4.9658
5.0517 1.3321 10000 4.7656
4.924 1.5985 12000 4.6236
4.8276 1.8650 14000 4.5289
4.6892 2.1313 16000 4.4570
4.6484 2.3978 18000 4.3995
4.6087 2.6642 20000 4.3493
4.5688 2.9306 22000 4.3070
4.469 3.1970 24000 4.2805
4.4547 3.4634 26000 4.2517
4.4454 3.7299 28000 4.2252
4.4201 3.9963 30000 4.1997
4.334 4.2627 32000 4.1863
4.3298 4.5291 34000 4.1689
4.3267 4.7956 36000 4.1511
4.22 5.0619 38000 4.1414
4.2395 5.3284 40000 4.1308
4.2375 5.5948 42000 4.1178
4.239 5.8612 44000 4.1024
4.1425 6.1276 46000 4.1030
4.1632 6.3940 48000 4.0945
4.1578 6.6605 50000 4.0836
4.1554 6.9269 52000 4.0719
4.0797 7.1933 54000 4.0745
4.0885 7.4597 56000 4.0674
4.091 7.7261 58000 4.0608
4.0897 7.9926 60000 4.0517
4.0219 8.2590 62000 4.0572
4.042 8.5254 64000 4.0528
4.0368 8.7918 66000 4.0463
3.9788 9.0582 68000 4.0463
3.9801 9.3246 70000 4.0450
3.9866 9.5911 72000 4.0415
3.9807 9.8575 74000 4.0394

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.1+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support