secret-model-stage-1-0.6B-512

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9233
Centroid Acc: 0.8679
Centroid Macro F1: 0.8730
Knn Acc: 0.8491
Knn Macro F1: 0.8563
Alignment: 0.6904
Uniformity: -3.0335
Combined Score: 0.8674

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.06
num_epochs: 100.0

Training results

Training Loss	Epoch	Step	Validation Loss	Centroid Acc	Centroid Macro F1	Knn Acc	Knn Macro F1	Alignment	Uniformity	Combined Score
No log	0	0	2.3522	0.6604	0.6607	0.8679	0.8627	0.3334	-0.7995	0.7280
1.5883	3.125	100	1.3699	0.7736	0.7708	0.8679	0.8755	0.6157	-2.0089	0.8057
1.3499	6.25	200	1.0940	0.9245	0.9226	0.8679	0.8721	0.5183	-2.0487	0.9058
1.0313	9.375	300	1.0252	0.8679	0.8690	0.8679	0.8663	0.4744	-1.9845	0.8681
0.527	12.5	400	0.8431	0.7925	0.7763	0.8302	0.8012	0.5916	-2.6024	0.7846
0.566	15.625	500	0.8905	0.8491	0.8572	0.8491	0.8440	0.5574	-2.4475	0.8528
0.5135	18.75	600	0.8551	0.8302	0.8318	0.8679	0.8672	0.5690	-2.5328	0.8436
0.3032	21.875	700	0.9638	0.8491	0.8516	0.8302	0.8352	0.6250	-2.6975	0.8461
0.2311	25.0	800	0.9164	0.8302	0.8323	0.8113	0.8084	0.6480	-2.8651	0.8243
0.2311	25.0	800	0.9164	0.8302	0.8323	0.8113	0.8084	0.6480	-2.8651	0.8243
0.226	28.125	900	1.0117	0.9057	0.9097	0.8491	0.8534	0.6128	-2.7859	0.8909
0.1224	31.25	1000	0.8148	0.8679	0.8682	0.8679	0.8655	0.6621	-2.9576	0.8673
0.0862	34.375	1100	1.0351	0.8679	0.8720	0.8491	0.8403	0.6816	-2.9616	0.8614
0.0973	37.5	1200	1.1075	0.8113	0.8060	0.8302	0.8223	0.6882	-2.9441	0.8115
0.0396	40.625	1300	0.8058	0.8491	0.8527	0.8679	0.8690	0.6195	-2.8024	0.8582
0.0139	43.75	1400	0.8962	0.8679	0.8690	0.8679	0.8726	0.6721	-2.9618	0.8702
0.0278	46.875	1500	0.9030	0.8679	0.8690	0.8679	0.8646	0.6892	-3.0348	0.8675
0.0158	50.0	1600	0.8332	0.8302	0.8371	0.8679	0.8726	0.6733	-2.9483	0.8489
0.0158	50.0	1600	0.8332	0.8302	0.8371	0.8679	0.8726	0.6733	-2.9483	0.8489
0.0787	53.125	1700	0.8769	0.8491	0.8527	0.8679	0.8690	0.6660	-2.9786	0.8582
0.0061	56.25	1800	0.9462	0.8491	0.8527	0.8491	0.8563	0.6764	-2.9874	0.8539
0.0203	59.375	1900	0.9591	0.8113	0.8060	0.8491	0.8527	0.6813	-2.9815	0.8216
0.0286	62.5	2000	0.8517	0.8491	0.8527	0.8679	0.8755	0.6959	-3.0418	0.8603
0.005	65.625	2100	0.8745	0.8679	0.8730	0.8679	0.8755	0.6853	-3.0349	0.8738
0.0035	68.75	2200	0.8911	0.8679	0.8730	0.8491	0.8563	0.6857	-3.0217	0.8674
0.0038	71.875	2300	0.9111	0.8679	0.8730	0.8679	0.8755	0.6848	-3.0124	0.8738
0.0041	75.0	2400	0.8897	0.8491	0.8510	0.8679	0.8755	0.6847	-3.0196	0.8592
0.0041	75.0	2400	0.8897	0.8491	0.8510	0.8679	0.8755	0.6847	-3.0196	0.8592
0.0051	78.125	2500	0.9212	0.8679	0.8730	0.8491	0.8563	0.6903	-3.0364	0.8674
0.0037	81.25	2600	0.9200	0.8679	0.8730	0.8679	0.8755	0.6847	-3.0164	0.8738
0.0228	84.375	2700	0.9245	0.8679	0.8730	0.8491	0.8563	0.6801	-3.0048	0.8674
0.0027	87.5	2800	0.9212	0.8679	0.8730	0.8491	0.8563	0.6883	-3.0251	0.8674
0.004	90.625	2900	0.9288	0.8679	0.8730	0.8491	0.8563	0.6930	-3.0377	0.8674
0.0032	93.75	3000	0.9245	0.8679	0.8730	0.8491	0.8563	0.6914	-3.0362	0.8674
0.0322	96.875	3100	0.9249	0.8679	0.8730	0.8491	0.8563	0.6907	-3.0344	0.8674
0.002	100.0	3200	0.9233	0.8679	0.8730	0.8491	0.8563	0.6904	-3.0335	0.8674
0.002	100.0	3200	0.9233	0.8679	0.8730	0.8491	0.8563	0.6904	-3.0335	0.8674

Framework versions

Transformers 4.56.0
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

525k params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support