secret-model-stage-1-0.6B-512
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9233
- Centroid Acc: 0.8679
- Centroid Macro F1: 0.8730
- Knn Acc: 0.8491
- Knn Macro F1: 0.8563
- Alignment: 0.6904
- Uniformity: -3.0335
- Combined Score: 0.8674
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.06
- num_epochs: 100.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Centroid Acc | Centroid Macro F1 | Knn Acc | Knn Macro F1 | Alignment | Uniformity | Combined Score |
|---|---|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 2.3522 | 0.6604 | 0.6607 | 0.8679 | 0.8627 | 0.3334 | -0.7995 | 0.7280 |
| 1.5883 | 3.125 | 100 | 1.3699 | 0.7736 | 0.7708 | 0.8679 | 0.8755 | 0.6157 | -2.0089 | 0.8057 |
| 1.3499 | 6.25 | 200 | 1.0940 | 0.9245 | 0.9226 | 0.8679 | 0.8721 | 0.5183 | -2.0487 | 0.9058 |
| 1.0313 | 9.375 | 300 | 1.0252 | 0.8679 | 0.8690 | 0.8679 | 0.8663 | 0.4744 | -1.9845 | 0.8681 |
| 0.527 | 12.5 | 400 | 0.8431 | 0.7925 | 0.7763 | 0.8302 | 0.8012 | 0.5916 | -2.6024 | 0.7846 |
| 0.566 | 15.625 | 500 | 0.8905 | 0.8491 | 0.8572 | 0.8491 | 0.8440 | 0.5574 | -2.4475 | 0.8528 |
| 0.5135 | 18.75 | 600 | 0.8551 | 0.8302 | 0.8318 | 0.8679 | 0.8672 | 0.5690 | -2.5328 | 0.8436 |
| 0.3032 | 21.875 | 700 | 0.9638 | 0.8491 | 0.8516 | 0.8302 | 0.8352 | 0.6250 | -2.6975 | 0.8461 |
| 0.2311 | 25.0 | 800 | 0.9164 | 0.8302 | 0.8323 | 0.8113 | 0.8084 | 0.6480 | -2.8651 | 0.8243 |
| 0.2311 | 25.0 | 800 | 0.9164 | 0.8302 | 0.8323 | 0.8113 | 0.8084 | 0.6480 | -2.8651 | 0.8243 |
| 0.226 | 28.125 | 900 | 1.0117 | 0.9057 | 0.9097 | 0.8491 | 0.8534 | 0.6128 | -2.7859 | 0.8909 |
| 0.1224 | 31.25 | 1000 | 0.8148 | 0.8679 | 0.8682 | 0.8679 | 0.8655 | 0.6621 | -2.9576 | 0.8673 |
| 0.0862 | 34.375 | 1100 | 1.0351 | 0.8679 | 0.8720 | 0.8491 | 0.8403 | 0.6816 | -2.9616 | 0.8614 |
| 0.0973 | 37.5 | 1200 | 1.1075 | 0.8113 | 0.8060 | 0.8302 | 0.8223 | 0.6882 | -2.9441 | 0.8115 |
| 0.0396 | 40.625 | 1300 | 0.8058 | 0.8491 | 0.8527 | 0.8679 | 0.8690 | 0.6195 | -2.8024 | 0.8582 |
| 0.0139 | 43.75 | 1400 | 0.8962 | 0.8679 | 0.8690 | 0.8679 | 0.8726 | 0.6721 | -2.9618 | 0.8702 |
| 0.0278 | 46.875 | 1500 | 0.9030 | 0.8679 | 0.8690 | 0.8679 | 0.8646 | 0.6892 | -3.0348 | 0.8675 |
| 0.0158 | 50.0 | 1600 | 0.8332 | 0.8302 | 0.8371 | 0.8679 | 0.8726 | 0.6733 | -2.9483 | 0.8489 |
| 0.0158 | 50.0 | 1600 | 0.8332 | 0.8302 | 0.8371 | 0.8679 | 0.8726 | 0.6733 | -2.9483 | 0.8489 |
| 0.0787 | 53.125 | 1700 | 0.8769 | 0.8491 | 0.8527 | 0.8679 | 0.8690 | 0.6660 | -2.9786 | 0.8582 |
| 0.0061 | 56.25 | 1800 | 0.9462 | 0.8491 | 0.8527 | 0.8491 | 0.8563 | 0.6764 | -2.9874 | 0.8539 |
| 0.0203 | 59.375 | 1900 | 0.9591 | 0.8113 | 0.8060 | 0.8491 | 0.8527 | 0.6813 | -2.9815 | 0.8216 |
| 0.0286 | 62.5 | 2000 | 0.8517 | 0.8491 | 0.8527 | 0.8679 | 0.8755 | 0.6959 | -3.0418 | 0.8603 |
| 0.005 | 65.625 | 2100 | 0.8745 | 0.8679 | 0.8730 | 0.8679 | 0.8755 | 0.6853 | -3.0349 | 0.8738 |
| 0.0035 | 68.75 | 2200 | 0.8911 | 0.8679 | 0.8730 | 0.8491 | 0.8563 | 0.6857 | -3.0217 | 0.8674 |
| 0.0038 | 71.875 | 2300 | 0.9111 | 0.8679 | 0.8730 | 0.8679 | 0.8755 | 0.6848 | -3.0124 | 0.8738 |
| 0.0041 | 75.0 | 2400 | 0.8897 | 0.8491 | 0.8510 | 0.8679 | 0.8755 | 0.6847 | -3.0196 | 0.8592 |
| 0.0041 | 75.0 | 2400 | 0.8897 | 0.8491 | 0.8510 | 0.8679 | 0.8755 | 0.6847 | -3.0196 | 0.8592 |
| 0.0051 | 78.125 | 2500 | 0.9212 | 0.8679 | 0.8730 | 0.8491 | 0.8563 | 0.6903 | -3.0364 | 0.8674 |
| 0.0037 | 81.25 | 2600 | 0.9200 | 0.8679 | 0.8730 | 0.8679 | 0.8755 | 0.6847 | -3.0164 | 0.8738 |
| 0.0228 | 84.375 | 2700 | 0.9245 | 0.8679 | 0.8730 | 0.8491 | 0.8563 | 0.6801 | -3.0048 | 0.8674 |
| 0.0027 | 87.5 | 2800 | 0.9212 | 0.8679 | 0.8730 | 0.8491 | 0.8563 | 0.6883 | -3.0251 | 0.8674 |
| 0.004 | 90.625 | 2900 | 0.9288 | 0.8679 | 0.8730 | 0.8491 | 0.8563 | 0.6930 | -3.0377 | 0.8674 |
| 0.0032 | 93.75 | 3000 | 0.9245 | 0.8679 | 0.8730 | 0.8491 | 0.8563 | 0.6914 | -3.0362 | 0.8674 |
| 0.0322 | 96.875 | 3100 | 0.9249 | 0.8679 | 0.8730 | 0.8491 | 0.8563 | 0.6907 | -3.0344 | 0.8674 |
| 0.002 | 100.0 | 3200 | 0.9233 | 0.8679 | 0.8730 | 0.8491 | 0.8563 | 0.6904 | -3.0335 | 0.8674 |
| 0.002 | 100.0 | 3200 | 0.9233 | 0.8679 | 0.8730 | 0.8491 | 0.8563 | 0.6904 | -3.0335 | 0.8674 |
Framework versions
- Transformers 4.56.0
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.0
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support