Plainly Optimized Network

Dataset: SUPERGLUE

Trainer Hyperparameters:

  • lr = 5e-05
  • per_device_batch_size = 8
  • gradient_accumulation_steps = 2
  • weight_decay = 1e-09
  • seed = 42
eval_loss eval_accuracy epoch
19.092 0.667 1.0
18.211 0.667 2.0
17.359 0.739 3.0
17.168 0.732 4.0
18.647 0.681 5.0
18.081 0.681 6.0
18.325 0.688 7.0
18.660 0.688 8.0
18.464 0.688 9.0
18.622 0.696 10.0
17.838 0.710 11.0
17.792 0.703 12.0
18.009 0.696 13.0
19.033 0.674 14.0
17.430 0.717 15.0
18.218 0.696 16.0
17.915 0.710 17.0
17.956 0.717 18.0
18.078 0.725 19.0
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support